gem5, GPGPUSim, McPAT, GPUWattch, Your favorite simulator here
上传者:李万林|上传时间:2015-05-04|密次下载
gem5, GPGPUSim, McPAT, GPUWattch, Your favorite simulator here
Stack Layers
AlgorithmApplicationCompilerOSIO
Mem Controler
CachesCore Microarch
CircuitsGatesTransistorsPhysics
Small FootprintMeduimFootprintLargeFootprint
gem5,GPGPUSim,McPAT,GPUWattch,"Yourfavoritesimulatorhere"
ConsideredHarmful
TonyNowatzki
Basic Energy Mathematical
tjn@cs.wisc.edumenon@cs.wisc.eduCharacterizationProofCustom First-Program Reasoned
Order ModelsAnalysisArguements
Cycle Accurate Simulation
JaikrishnanMenonChen-HanHoKarthikeyanSankaralingam
UniversityofWisconsin-Madison
ho9@wisc.edu
karu@cs.wisc.edu
MuchasDijkstra,in1968,observedthedangersofrely-Best Research Approach?How do I ingonthegotostatement,weobservethatthedominant
evaluate relianceonquantitativesimulatorsishavingadetrimentalef-fectonour eld.Overtime,simulatortoolshavebecomemoreSatisfy Program Committee?my idea?
sophisticated.FromthesimpledaysofthenowdebunkedSim-“Cycle Accurate Simulation”pleScalarwithitsRUU-basedOOOmodelwith xedDRAM
latency,tothegem5+DramSim+GPGPUSim+McPATmashupsimulator,wehavecomealongwayinwhatarchitectsareclaimingasvalidatedtools.Weargue,though,thatnewgener-ationsofsimulatorsareoftenover ttedtocertainbenchmarksorcon gurationsforvalidationandcanhavesigni cantmod-elingerrorsthatresearchersarenotawareof.Thoughtheexis-tenceoftheseerrorsareunsurprising,theycancauseunawareuserstoderiveincorrectconclusions.Simultaneously,andevenmoreproblematic,isthatreviewersdemandresearchersinappropriatelyusethesetools.Weenumerateeightcommon,butnotacknowledgedorrecognizedpitfallsofsimulatorsorsimulatoruse,consideringfourmodernsimulationinfrastruc-tures.Weproposethattheevaluationstandardsforaworkshouldmatchit’s“footprint,”thebreadthoflayerswhichthetechniqueaffects,andconcludewithouropiniononhowtoescapeoutofour eld’ssimulate-or-rejectmindset.
1.Introduction
Foranumberofyearswehavebeenfamiliarwiththeobserva-tionthatthequalityofarchitectureresearchersisadecreasingfunctionofthere-lianceonquantitativearchitecturesimulatorsinthearchitecturepaperstheyproduce.Morerecentlywediscov-eredwhytheuseofarchitecturesimulatorshassuchdisastrouseffects,andwebecameconvincedthatthearchitecturesimulatorshouldbeabolishedfromall"higherlevel"architectureresearch.Atthattimewedidnotattachtoomuchimportancetothisdiscovery;wenowsubmitourconsiderationsforpub-lication1.MuchasDijkstraobservedtheeraofrelianceonthegotostatementwashavinganegativeeffect,weobservetheeraofover-relianceonquantitativesimulatorsishavingadetrimentaleffectonthe eld,andshouldcometoanend.Weobservethatsimulation,inparticular“detailed”toolsthatprovidecycle-accurateperformanceestimates,areaesti-mates,powerandenergyestimates,asavehicleforarchitec-paragraphisreproducedandcriticismsaremodi edfromDijkstra’s
seminalACaseagainsttheGOTOStatement[10].Additionsareinitalics.
1This
Figure1:Thefootprintofatechnique(thescopeoflayersitinteractswith),andthechoiceresearchersfacebetweenap-propriateevaluationandPC-compliantevaluationpractices.
tureresearchisubiquitous.FromthesimpledaysofthenowdebunkedSimpleScalar[8]withitsRUU-basedOOOmodel+ xedmemorytogem5+DramSim+GPGPUSim+McPATmashupsimulator,wehavecomealongwayinwhatarchi-tectsareclaimingasvalidatedsimulators2.Thislevelofaddeddetailhasledtothebeliefthatwehavebettertoolsandaredoingbetterandbetterquantitativeevaluation.Ithasalsoledtothepreponderanceofpapersrelyingonsuchtoolsandhascreatedanimplicitstandardandtemplateofhowquantitativeevaluationmustbedone.Thisrelianceandbeliefinsuchdetailedtoolsishurtingthe eldandcreatingvariouspitfalls.Partoftheproblemisthatthesetoolsarecommonlyover- ttedforvalidation,meaningthattheirparametersaretunedsuchthattheyareaccurateonlyonasmallsetofbenchmarksorcon gurationparameters.Theimplicationofover ttingisthatsimulatormodelscapturethenoiseratherthanthefunda-mentalrelationshipsandtradeoffs.Inaddition,simulatortoolsoftenhavesigni cantmodelingerrorswhicharenoteasily
2We
remarkthatnotallsimulators’authorsthemselvesclaimvalidation.
accessiblebyusers.Overall,therelianceonsimulatorsarecreatingmanypitfallsbothintechnicalaspectsandinhurtingthe eldbydistortingreviewerexpectationsofwhatentailsgoodquantitativeevaluation.
Asawayforwardforresearchers,webelievethatthecorrectapproachdependsonthefootprint,orlayersofthestackwhichthetechniqueaffectsorrelieson,andthatthereisnoone-size- ts-allsolutiontoarchitectureresearch.Figure1highlightshowdifferenttechniques,representedbygrayboxes,canaf-fectdifferentstacklayers.Unfortunately,itistoooftenthecasethatresearchersmakethechoiceofresearchapproachbasedonwhatwillgettheirpaperaccepted,ratherthanwhatisthemostscienti c.Werevisitaversionofthis gurewithspeci cexamplesinSection4.Mostimportantly,webelievethatreviewersmustrecalibratetheirevaluationstandards,andappropriatelygaugethemtothefootprintoftheresearch.Webelievethisissueisimportantandvitalnowasmoreresearchinour eldismovingtowardlargerfootprints,evidencedbyrecentkeynotes[7]andfundingcalls[21].Restrictingour-selvestoanill-suitedone-size- tsallapproachcouldcurtailscienti cadvancementoftheseefforts.
Thispaperenumerateseightcommon,butnotacknowl-edgedorrecognizedpitfalls,consideringfourmodernsim-ulationinfrastructures:gem5[5],McPAT[19],GPGPUSimV2.x[3],andGPUWattch[18].Webeginthispaperwithasectiondescribingerrorsinpopularsimulators,whichweusetosubstantiatethepitfalls.Indiscussingsimulatorerrorsandpitfalls,ourgoalisnottooffendorcriticizebuttoinformandprovokethoughtfuldiscussion.Weconcludewithstrategieswhichcanallowustoescapeoutofour eld’stemplatizedsimulate-or-rejectmindset.
andwebelievetheseproblemscanbetackledwithoutdif -culty.Werevisitthebene tsofcommunitydriventoolsinSection4.2.
Conservative/ObscureDefaultforWritebackMechanism:Thegem5OOOmodelonlyschedulesinstructionsforissueifthereareguaranteedtobeenough“writebackbuffers”forthem,wherethetotalbuffersarecalculatedbywriteback-width×writeback-depth.Thedefault,acrossallISAs,isawriteback-depthof1.Thismeansthatifafewlonglatencyinstructionsholdupwriteback-bufferslots,thentheeffectiveissuewidthgoesto0.ForanOOO2-widecorewithbench-marksthathavelong-latencymemoryreferences,adding5bufferslotsincreasesperformancebymorethan5X.WedonotbelievethistradeoffisrepresentativeofrealOOOdesigns,andthisimportantparameterisnotsuf cientlyde nedinthedocumentationorsourcecode.
InconsistentPipelineReplayMechanism:gem5’sOOOmodelforspeculativeinstructionschedulingandpipelinere-playappearstobebothcontradictoryandunnecessarilycon-servative.Toexplain,adeeplypipelinedOOOcoremustspeculativelyscheduleinstructionstoenableback-to-backex-ecution.Whenanunexpectedlatencyoccurs,thescheduleforthemiss-dependentinstructionsneedstobecorrected.Ingem5,whenaloadissuestoablockedcache,gem5conser-vativelymodelsthe“correction”tothespeculativescheduleby ushingtheentirepipeline.Thelargerissueisthatafterapipeline ush,instructionsareimmediatelyrescheduled,evenifthecacheremainsblocked.Thisleadstoacycleofrepeated ushingoftheentirepipeline.Whiletheperformancedoesnottakeasigni canthit,theamountofenergycandoubleonsomebenchmarksversusadesignwithahandfulmoreMSHRstopreventthecachefromblocking.
Tobeconsistent,anarchitecturewhich ushesthepipelineonacache-blockshouldalso ushthepipelineonothervariable-latencyevents.However,gem5doesnot ushthepipelineoneventslikecachemisses,whichwouldhavevari-ablelatency.Inshort,thepipelinereplaymechanismissimul-taneouslybothhighlyconservativeandoptimistic.
Inef cient/MislabeledMicro-ops:gem5micro-opsareop-timizedmoreforcorrectnessandeconomyratherthanef -ciency.Oneexampleisthatthesamemicro-opthatperformsconditionalmovesalsoperformsregularregistermoves.Thismeansthatregularmoveswillincurthedynamicdependenceandenergycostofreadingthedestinationregister,eventhoughtheyarecompletelyoverwritingit.Also,thoughthegem5 agregisterimplementationhasgreatlyimprovedinrecentversions,afewinstructionsstillrequireextradependenciesandregisterreadsbecauseof agregistergrouping.Oneex-ampleishowlogicalinstructions(likeXOR)don’twritetheAF ag,butsinceitisgroupedwiththeother ags,itmustbereadbeforewritten.Thisisarguablyacceptable,butdif culttounderstandandaccessasauser.
Animportantyet xableproblemisthatsomemicro-opsare2.Errorsinsimulators
Webeginby rstoutliningsomeexampleinstancesoferrorsinmainstreamandpopularsimulators.Webelievetheexis-tenceoftheseerrorsshouldneitherbesurprising,noraretheyintendedasanattackonparticularsimulatorsorsimulatorauthors;anylargebodyofcodewillhaveerrors.Weonlybringattentiontoaddsomecontexttoourpitfallsandaidinsubstantiation.Ifanything,ourcriticismissquarelyaimedatusersofsuchtools,forexampleGovindarajuetal.[12].Errorreportsareavailableathttp://www.cs.wisc.edu/vertical/sim-harmful,whichhavebeenveri edbyatleastoneotherpersonnotaf liatedwithourresearchgroup.Theirpurposeistopointoutthetypeofproblemswhichcanbedetrimentalifusersarenotaware.Inthissection,foreachtool,we rstpresentobservationsaboutanissue,thengiveouropinionstheissue’simplications.
2.1.gem5
Tobeclear,theerrorsdiscussedinthissectionhaveonlybeenveri edontheX86versionofgem5,andthemicro-opis-suescanonlyapplytoX86.Also,someofthebelowerrorshavebeencommunicatedtothequiteactivegem5community,
内容需要下载文档才能查看
which(nofp
memorymultiplies,upinthedatafromgem5toperformenergyanalysison oatingpointcodewouldproduceincorrectresultsbypotentiallyintegerfactors.2.2.McPAT
Unclear/Over ttedFunctionalUnit(FU)EnergyModel-ing:IntheMcPATmodel,ifthecoreisOOO,thenasmalldynamiccomponentofenergyisaddedforeachFUregardlessofwhethertheFUisbeingused.Thisconstantiscitedas“averagenumbersfromIntel4Gand773Mhz(Wattch)”.WhythisoccursinOOObutnotInorderprocessorscouldbeduetoover ttinginvalidation.Anotherrelatedexampleisfortheper-accessenergyofanFU.Iftheprocessoris“embed-ded,”thenthispowerisdividedbytwo,citing:“AccordingtoARMdataembeddedprocessorhasmuchlowerperaccenergy”.Whetherornotthese(inouropinion)seeminglyarbitrarydecisionsarevalidornot,sincetheyarenoteas-ilyaccessibleordecipherablebytheuser,theymaycometoincorrectconclusionsaboutthequantitativeresults.
ErrorinPipelineandClockPowerMcPATcalculatesanestimateofthepipelineandclockpowerconsideringswitch-ingfactorsinpipeline ip- ops.ThispowerisnotreporteddirectlyinMcPAT,rather,itisdistributedequallyamongstthevariousprocessorstructures,makingitdif culttodeter-minewhenthereareerrors.Figure2showsthedynamicpowerwhichthepipelinecontributesforinorderandOOOprocessors(65nm),whichcanonlybeseenbyinstrumentingtheMcPATsourcecode.OurexperimentsshowthatthiscomponentofpoweriseffectivelydroppedforallOOOcoreexperimentslastinglongerthanafewcycles.Theerrorappearstobeintro-ducedwhenconvertingbetweenpowerandenergy,whereafactorofthenumberofcyclesislostfortheOOOcoreonly.ThisapparenterrorisinallversionsofMcPATthatwetested(fromv0.7tov1.1March2014).Theimplicationofthiserroristhatitcreatesuncertaintyabouttheestimationofpipelineandclockpower3.
2.3.GPGPUSimV2.x
Inthissubsection,weconsiderawidelyadoptedversionoftheGPGPUSimtool,anddescribeseveralmissingorabstractedcomponentsofitsarchitecturalmodel.WediscussGPG-PUSimV2.x,eventhoughitisnotthelatestversionofthetool,speci callybecausemanyresearchersarestillusingthisversion[16,17],andwebelievethefollowingclaimsabout
thoseawareoftheexactdetailsofMcPAT,whenitisusedina ne-grainedmode(calledeverycycle-asopposedtotheXMLinterfaceofcallingatendofmillionsofcyclesofsimulation),thisissuewilldisappear.However,theXMLbulkmodeisthemostprevalentusageofMcPATinliterature.
3For
Figure2:McPATpipelinepowerfora65nmidleprocessor.
itsmodelingfeaturescanbemadewithoutcontroversy.GPG-PUSim3.xhas xedmanyoftheseissues,(seeslide20inthetutorial[2]).
RegisterFilemicroarchitecture:Theoperandcollector(single-portedregister lebanks+arbiter+X-bar+col-lectorunits)ismodeledassuming xedlatencyaccessestotheSRAMwithsomeadditionalqueuinglatency.Itdoesnotmodellow-leveldetails,likecontention,whichimpactperformanceinhigh-computebandwidthscenarios.
Thread/warp/wavefrontschedulinganddispatch:Threadschedulingisfunctional,andwhileanumberofdifferentwarpschedulingschemesareimplemented,thesearenotmodeledinthemicroarchitecture,theyaresimplygeneratedfunctionally.
BranchdivergencestructuresandBranchUnit:Similartothreaddispatch,branchdivergencetrackingstructuresarefunctionallyemulatedaspartoftheabstracthardwaremodel,andthebranchunitmicroarchitectureisnotmodeledatthecycle-level.
Theeffectofomittingthedetailedmodelingofthesemicro-architecturalfeatures,andaccountingforthemabstractlyorfunctionally,isthatitencouragesarchitectsnottoreasonaboutthemicroarchitecturalfeasibilityoftheproposedtechnique.Forexample,considerdevelopingandevaluatinganon-trivialwarpschedulingtechniqueinGPGPUSimV2.x.Itsmodelwouldbeafunctionalone,meaningthatitwouldnotcapturetheindividualcomponentsofthehardware,theircommunica-tion,ortheirpipelinestages.ThiswouldbetantamounttoaCPUload-storequeuedesignevaluationwhichfunctionallymodelsthedependencepredictor,whileignoringcacheportcontentionetc.FortheCPUdomain,thismashupofhigh-levelmodelingandlow-levelsimulationwouldnotbeconsideredsuf cienttounderstandtheeffectivenessofatechniquequan-titatively.2.4.GPUWattch
GiventhestraightforwardreadingoftheGPUWattch[18]pa-per,itsmethodologyhasaformofmodelingerrorwhichwecall“mathematicallyirrelevant”modeling.Wede nethisasmodelwhich,whentakenasawhole,containsmathematically-irrelevantsub-components.The rstpartofthissubsection
willdescribehowthisformoferrorappliestothemethodol-ogy(aspresented)inLengetal.[18].Essentially,thedetailedmodelingusingMcPAT,empiricalmemorymodelsandsyn-thesisbasedmodelsaren’tmeaningfultothe nalobtainedprediction.However,additionalunpresenteddetailsoftheGPUWattchmethodologyhelpjustifythedetailedmodeling.Therefore,wewillsubsequentlydiscusssomeofthesedetails,andconcludewiththeimplicationsforappropriatemodelus-age.
GPUWattchPowerModeling(aspresented)GPUWattchmodelsthecycle-levelpowerofGPUarchitecturesby rstus-ingGPGPUSimtoobtainactivityfactors.Then,GPUWattchcalculatesthedynamicpowerofaparticularbenchmarkPbench,asthesumoftheactivityfactorsαbench,comp,multipliedbythemaximumpowerofthecomponent,PMAXcomp.ThePMAXcomppowervalueisobtainedthroughhighlydetailedmodelingusingacombinationofMcPAT-basedmodeling,em-piricalmodels,andsynthesis-basedmodels.TheGPUWattchauthorsstatethatsinceMcPATistunedtoCPUs,andsincetherearemanyundocumentedGPUfeatures,theyneedtocorrectforthisbyaddinganerrortermforeachcomponentxcomp,anduseleast-squaresestimation(linearregression)toestimatetheerrors.Their nalmodelfordynamicpoweris:
Pbench=
Ifthismethodologywasdirectlyapplied,thecomputationofPMAXismathematicallyirrelevant.Performingalin-eartransformationontheexplanatoryvariablesofalinearregressiondoesnotaffecttheerrororpredictionaccuracy.Infact,runningthebelowregression,whichdoesnothavePMAXcompvalues,wouldbemathematicallyequivalent,andtheresultingregressioncoef cientsaresimplyscaledasfol- lows:xcomp=PMAXcomp×xcomp.
Pbench=
comp
∑
αbench,comp×xcomp
(2)
WhatthismeansisthatthePMAXcompvariablesaremath-ematicallymeaninglesstothe nalmodel.Therefore,userswhoapplytheGPUWattchmethodologyaswrittenwillputunnecessaryeffortintodetailedpowermodeling(whichwouldincludeMcPAT,empiricalmodelandsynthesismodeldevel-opment).
GPUWattchPowerModeling(asimplemented)Themethodologywhichisimplementedactuallydoesemploythedetailedpowermodelingresultsduringthescalingparameterselectionforsomepurpose,asweexplainnext5.
First,insteadofscalingtheinternalpowervaluesofMc-PAT’soptimizationframework,whatisactuallyscaledaretheactivitycountswhicharefedasinputstoMcPAT.ThisassumesthatMcPAT’schoiceofcomponentswouldbeunaf-fectedbythedifferentpowerscalingfactorappliedtovariouscomponents.
Second,insteadofautomaticlinearregression,theycal-culatetherootmeansquareerroroftheirpredictionsandmanuallymodifythescalingcoef cientstoreducetheerror.Thisalonewouldbethemanualequivalenttolinearregression,andhencewouldstillhavethemathematicalirrelevancebug.However,theauthorsalsoboundthescalingcoef cientsbybetween10×to50×foron-coreandoff-corecomponentsrespectively(here,theauthorsexplainthattheboundischosenbasedonthecon denceintheoriginaldetailedmodel).Theauthorsobservethatwithoutboundingthescalingcoef cients,theerrorisactuallyless:amathematically“better”model.However,theper-componentbreakdownswithpurelinearre-gressiondonotmatchexpectedintuition(toobigornegativescalingfactors).Therefore,theboundsonscalingfactorsserveasaroughguidelineinattainingaplausiblepowerdistribution.Overall,webelieveitispossibletouseapurelymathematicalapproach,applyingthesametypeofroughintuition,toachievethesamequalityofresultswithoutdetailedpowermodelinglikeMcPAT.
WhatareGPUWattch’sappropriateusecases?ThemethodologybehindtheGPUWattchpowermodelhasim-plicationsforitsappropriateusage.Ourpositionisthatitthatitcanonlybeappropriatelyemployedwhenaphysicalartifactwithmeasurablepowernumbersareavailable.Forthe
authorsgraciouslyprovidedusdetailsontheirmethodology,andwe
taketheblameifwehavemadeanymistakesinreproducingithere.
5The
comp
∑
αbench,comp×PMAXcomp×xcomp(1)
Inlinearregressionterminology,αbench,comp×PMAXcomparetheexplanatoryorinputvariables,xcomparetheregressioncoef cientsandPbenchisthedependentvariable.Atthispoint,theauthors’methodologyisasfollows:
Weiterativelyre nethepowermodelonthebasisofthesourcesofthevariousinaccuraciesthatLSE[regression]identi es.Forinstance,inourinfrastructure(i.e.,McPAT)thepowerestimationforcertaincomponentsisbiasedtowardCPUimplementations.WenarrowtheresultinginaccuracygapfortheGPUpowermodelby xingourinitialassump-tionsabouttheimplementationandthenapplyingthescalingfactorsthatareobtainedfromLSE.
WecontendthatadirectinterpretationoftheirmethodologywouldbetoruntheregressioninEquation1usingmeasuredvaluesofPbenchofsomemicrobenchmarksto ndthecompo-nentwiseerrorsxcomp,thenmodifythesourcecodetomultiplytheoriginalcomponentwisepowerαbench,comp×PMAXcompbythe“scalingfactor”forthatcomponent,xcomp,toobtainthe nalpowerestimate.Notethatthisprocedureisperformedonaplatformspeci cbasis4.
theXML lesprovidedinthetoolforGTX480and
QuadroFX5600,thescalingcoef cientsaretheseriesof32paramnamesstart-ingatline31(TOT_INST,FP_INT,IC_H,etc.)Inthesourcecode,ingpgpu_sim_wrapper.c,theseareusedinmethodslikeset_inst_power,set_regfile_poweretc.toscaleuptheMcPATcomputedvalues.
4Considering
twoplatformswhichhavecon gurationsnow,theGTX480wasreleasedin2010,andthetheQuadroFX5600isevenolder.GeneratinganewGPUWattchcon gurationrequiresattainingdetailedpowermeasurements,includingphysicallyinstrumentingtheGPUpowersupplywithsensingresistors,followedbyanapplicationofthemanualerror-minimizationproceduredescribedabove.
Thereasonwhyaphysicalartifactisnecessaryisthatthescalingfactors,xcomp,areplatformspeci c.Asanexample,considerthepowerofregister leaccessintheGTX480andFX5600.TheMcPATscalingbetweenthetwodesignsdoesnotcapturetheirarchitecturaldifferences,whichshowsupintheGPUWattchmodelastheratiooftheirregister lescalingfactors,whichis1.7×.
IfwewanttoconsiderahypotheticalGPUwithdifferentcon gurationparameters,withoutchangingthescalingfactors,weshouldnotexpecttheGPUWattchpowermodeltobevalid.Toexplain,theaveragescalingfactormagnitudeis22×fortheGTX480,and8×forQuadroFX5600.Toclaimthatthehypo-theticalGPUcon gurationisvalid,theargumentthatwouldhavetobemadeisthatMcPATgetsthepowerwrongbyanor-derofmagnitude,butsomehowcangettherelativescalingofcomponentscorrect.Thisisapositionwebelieveisuntenablewithoutevidence.Thelackofvalidatedcon gurabilitywouldimpedeanaccuratedesignspaceexploration.
GoodusesforGPUWattchwouldincludeestimatingtheenergyimpactofpolicychangeswhichaffecttheactivityfac-tors,orinaddingcomponentswhichhaveexternallyvalidatedpowercharacteristics(again,ifthetargetarchitecturealreadyhasaGPUWattchpowermodel).Revisitingtheconceptoffootprint,thesearebothsmall-footprintevaluationscenarios.WeclarifyherethattheauthorsofGPUWattchnevermentiondesignspaceexplorationassomethingtheirtoolismeantfor.Soagain,ourcriticismisaimedattoolusersratherthandevel-opers,andadditionallythereviewerwhonowthinksenergyestimationforGPUresearchisalwaysdoable.
accessible,makingbugsdif cultto ndevenwithcarefuldataanalysis.Thatisbecausemanyofthefeaturesareobscuredbehindimplicitassumptions,lackofdocumentationandlackofgoodreportingofresults.OneexampleishowthepipelinepowerisreportedinMcPAT.Sinceitisimplicitlydistributedamongsttheindividualcomponentsoftheprocessor,whatappearstobeasigni canterrorisobfuscated.Errorslikethisputresearchersinadif cultposition.Shouldtheygo xthetoolwhichisalreadyvalidated?Andwhatifanothererrorintheoppositedirectioniscancelingouttheeffects?
Suggestions:Authorsshould rstvalidateandsanitycheckthesimulatorindividually.Further,whenitmakessense,theyshouldconsiderbuildingtrace-driventoolsthatmodelthe rstordereffectstheyareawareoff,insteadofusingcycle-accuratetools.Webelieveitisbettertohaveatoolwithknownabstractionerrorsthananunknownblackbox.Reviewersandthecommunityneedstochangeitsmindsetaswell–havingblindfaithin“standardtools,”whilecompletelydiscountingothertoolsisnotappropriate.Werevisittheissueofopenversusin-housetoolsinSection4.
3.2.Pitfall2:Falsecon dencefromvalidation-over-generalizationinsimulatorpapers,ortoolmisusesSimulatorwriterstypicallymakenarrowandfactuallyconsis-tentstatementsaboutvalidation,andsomeexamplesarebelow.However,thenatureofvalidationisoftenmisunderstoodbyusers,andthesetoolsareputtouseinwaysnotintendedfor,includingmakingquantitativegeneralizations.
gem5’sOOOmodeliswidelyused,butasobservedinarecentpaper[13]andourobservationsabove,ithasseveralspeci cationerrors.Thoughthegem5authorsthemselvesdonotclaimitassuch,somedoclaimitisa“validatedsimu-lator.”Clearly,thiscannotbetakenasalleffectsmodeled.Forinstance,atechniquethatworksontheinstructionfront-endmustpayattentiontogem5’sbaselineand rst xthespeci cationerrordescribedhere[13].
ConsideringMcPAT,accordingtotheirowndocumentationandcodecomments,constantsaresometimeschosentomatchthevalidationtargets.Weagreethisisareasonabledecisioninsomecases,especiallywhenhighlycustomizedlogicisemployed(e.g.functionalunitimplementations).Thedangeriswhenresearchersattempttogeneralizetheresultsoutsidethevalidatedprocessors.Theseconstantswilllikelynotbeappropriate.
ForGPUWattch,itmightbetemptingforresearcherstoper-formsensitivitystudiesbyvaryingMcPATparameters.Thepathofleastresistancewouldbetousethesamescalingfac-tors,insteadofmeasuringthepowerofaknownGPUandderivingnewscalingfactorsusingtheGPUWattchmethodol-ogy.Forreasonsdescribedintheprevioussection,wearguethatwithoutobtainingnewscalingfactors,thistypeofsensi-tivityanalysiswouldbeinappropriate.
Suggestions:Usewithcautionvalidatedsimulators.Lookfordetailsonthesimulator’sdesignandfactorthosedecisions3.Pitfalls
Thissectiondescribeseightpitfallsofmodernsimulatorsandsimulatorusage.Foreachpitfall,wedescribethehigh-levelproblemandsubstantiateourpositionwithempiricalevidence.Wethengiveouropinionsonhowbesttoavoidthepitfall.3.1.Pitfall1:Errorsinsimulatorsareinaccessibleto
users
Asoutlinedabove,simulatortoolscanhavesigni cantabstrac-tion,modeling,andspeci cationerrors.Furthermore,sincesimulatorsaredistributedasC/C++codewithlittlespeci -cation,itisdif cultforenduserstoevenbecomeawareoftheseerrors.Withoutunderstandingwhetherthesimulatoriscorrectlycapturingtheparticularphenomenonadesignerisinterestedin,off-the-shelfusagerendersthemineffectiveforeven rst-orderanalysisofeffects.
Sometimes,thefeaturesofmodelingtoolsarenoteasily
下载文档
热门试卷
- 2016年四川省内江市中考化学试卷
- 广西钦州市高新区2017届高三11月月考政治试卷
- 浙江省湖州市2016-2017学年高一上学期期中考试政治试卷
- 浙江省湖州市2016-2017学年高二上学期期中考试政治试卷
- 辽宁省铁岭市协作体2017届高三上学期第三次联考政治试卷
- 广西钦州市钦州港区2016-2017学年高二11月月考政治试卷
- 广西钦州市钦州港区2017届高三11月月考政治试卷
- 广西钦州市钦州港区2016-2017学年高一11月月考政治试卷
- 广西钦州市高新区2016-2017学年高二11月月考政治试卷
- 广西钦州市高新区2016-2017学年高一11月月考政治试卷
- 山东省滨州市三校2017届第一学期阶段测试初三英语试题
- 四川省成都七中2017届高三一诊模拟考试文科综合试卷
- 2017届普通高等学校招生全国统一考试模拟试题(附答案)
- 重庆市永川中学高2017级上期12月月考语文试题
- 江西宜春三中2017届高三第一学期第二次月考文科综合试题
- 内蒙古赤峰二中2017届高三上学期第三次月考英语试题
- 2017年六年级(上)数学期末考试卷
- 2017人教版小学英语三年级上期末笔试题
- 江苏省常州西藏民族中学2016-2017学年九年级思想品德第一学期第二次阶段测试试卷
- 重庆市九龙坡区七校2016-2017学年上期八年级素质测查(二)语文学科试题卷
- 江苏省无锡市钱桥中学2016年12月八年级语文阶段性测试卷
- 江苏省无锡市钱桥中学2016-2017学年七年级英语12月阶段检测试卷
- 山东省邹城市第八中学2016-2017学年八年级12月物理第4章试题(无答案)
- 【人教版】河北省2015-2016学年度九年级上期末语文试题卷(附答案)
- 四川省简阳市阳安中学2016年12月高二月考英语试卷
- 四川省成都龙泉中学高三上学期2016年12月月考试题文科综合能力测试
- 安徽省滁州中学2016—2017学年度第一学期12月月考高三英语试卷
- 山东省武城县第二中学2016.12高一年级上学期第二次月考历史试题(必修一第四、五单元)
- 福建省四地六校联考2016-2017学年上学期第三次月考高三化学试卷
- 甘肃省武威第二十三中学2016—2017学年度八年级第一学期12月月考生物试卷
网友关注
- 教师资格证考试:《高等教育心理学》考点模拟题归纳六
- 2016年上半年政治教师资格笔试高频考点模拟题
- 教资国考:结构化面试押题模拟题三十二
- 全国教师资格统考政治知识核心考点《我国公民的政治参与》模拟题考点
- 教资国考:结构化面试押题模拟题三十五
- 教资国考:结构化面试押题模拟题三十
- 幼儿结构化名家观点类模拟题:幼儿美育(苏霍姆林斯基)
- 教师资格证考试:《高等教育心理学》考点模拟题归纳五
- 教资国考:结构化面试押题模拟题二十七
- 教师资格考试初中生物模拟题
- 教师资格证考试:《高等教育心理学》考点模拟题归纳九
- 中小学教师资格面试考题模拟题:学生习惯题海战术的看法
- 教资国考:结构化面试押题模拟题二十六
- 教资国考:结构化面试押题模拟题三十七
- 教师资格证考试:《高等教育心理学》考点模拟题归纳一
- 教资国考:结构化面试押题模拟题二十八
- 中小学教师资格面试考题模拟题:如何看待小学生竞选班干部变相拉票
- 幼儿园教师资格面试考题模拟题:教师如何开展德育工作
- 教资国考:结构化面试押题模拟题三十一
- 中小学结构化时政类模拟题:科学课新标准
- 幼儿园教师资格面试考题模拟题:幼儿园小学化现象
- 教师资格证考试:《高等教育心理学》考点模拟题归纳三
- 教师资格面试备考之结构化面试题目模拟题
- 幼儿园教师资格面试考题模拟题:幼儿园暴力虐童事件
- 教资国考:结构化面试押题模拟题三十六
- 教师资格证考试:《高等教育心理学》考点模拟题归纳八
- 教资国考:结构化面试押题模拟题三十三
- 中小学教师资格面试考题模拟题:学校对学生成绩进行排名并予以公布现象
- 中小学教师资格面试考题模拟题:对学生要凶,不然不听话
- 幼儿结构化模拟题之教师职业认知
网友关注视频
- 沪教版牛津小学英语(深圳用) 四年级下册 Unit 7
- 小学英语单词
- 第12章 圆锥曲线_12.7 抛物线的标准方程_第一课时(特等奖)(沪教版高二下册)_T274713
- 沪教版牛津小学英语(深圳用) 四年级下册 Unit 8
- 七年级英语下册 上海牛津版 Unit9
- 沪教版牛津小学英语(深圳用) 四年级下册 Unit 2
- 七年级下册外研版英语M8U2reading
- 河南省名校课堂七年级下册英语第一课(2020年2月10日)
- 8.练习八_第一课时(特等奖)(苏教版三年级上册)_T142692
- 二年级下册数学第三课 搭一搭⚖⚖
- 冀教版英语四年级下册第二课
- 8 随形想象_第一课时(二等奖)(沪教版二年级上册)_T3786594
- 化学九年级下册全册同步 人教版 第18集 常见的酸和碱(二)
- 沪教版八年级下册数学练习册20.4(2)一次函数的应用2P8
- 冀教版小学数学二年级下册第二单元《余数和除数的关系》
- 每天日常投篮练习第一天森哥打卡上脚 Nike PG 2 如何调整运球跳投手感?
- 沪教版牛津小学英语(深圳用) 六年级下册 Unit 7
- 苏科版八年级数学下册7.2《统计图的选用》
- 冀教版小学数学二年级下册第二单元《有余数除法的整理与复习》
- 沪教版牛津小学英语(深圳用) 五年级下册 Unit 7
- 苏科版数学 八年级下册 第八章第二节 可能性的大小
- 19 爱护鸟类_第一课时(二等奖)(桂美版二年级下册)_T3763925
- 【部编】人教版语文七年级下册《泊秦淮》优质课教学视频+PPT课件+教案,湖北省
- 30.3 由不共线三点的坐标确定二次函数_第一课时(市一等奖)(冀教版九年级下册)_T144342
- 冀教版英语三年级下册第二课
- 【部编】人教版语文七年级下册《过松源晨炊漆公店(其五)》优质课教学视频+PPT课件+教案,江苏省
- 冀教版小学数学二年级下册第二周第2课时《我们的测量》宝丰街小学庞志荣.mp4
- 北师大版数学 四年级下册 第三单元 第二节 小数点搬家
- 二次函数求实际问题中的最值_第一课时(特等奖)(冀教版九年级下册)_T144339
- 沪教版牛津小学英语(深圳用) 五年级下册 Unit 10
精品推荐
- 2016-2017学年高一语文人教版必修一+模块学业水平检测试题(含答案)
- 广西钦州市高新区2017届高三11月月考政治试卷
- 浙江省湖州市2016-2017学年高一上学期期中考试政治试卷
- 浙江省湖州市2016-2017学年高二上学期期中考试政治试卷
- 辽宁省铁岭市协作体2017届高三上学期第三次联考政治试卷
- 广西钦州市钦州港区2016-2017学年高二11月月考政治试卷
- 广西钦州市钦州港区2017届高三11月月考政治试卷
- 广西钦州市钦州港区2016-2017学年高一11月月考政治试卷
- 广西钦州市高新区2016-2017学年高二11月月考政治试卷
- 广西钦州市高新区2016-2017学年高一11月月考政治试卷
分类导航
- 互联网
- 电脑基础知识
- 计算机软件及应用
- 计算机硬件及网络
- 计算机应用/办公自动化
- .NET
- 数据结构与算法
- Java
- SEO
- C/C++资料
- linux/Unix相关
- 手机开发
- UML理论/建模
- 并行计算/云计算
- 嵌入式开发
- windows相关
- 软件工程
- 管理信息系统
- 开发文档
- 图形图像
- 网络与通信
- 网络信息安全
- 电子支付
- Labview
- matlab
- 网络资源
- Python
- Delphi/Perl
- 评测
- Flash/Flex
- CSS/Script
- 计算机原理
- PHP资料
- 数据挖掘与模式识别
- Web服务
- 数据库
- Visual Basic
- 电子商务
- 服务器
- 搜索引擎优化
- 存储
- 架构
- 行业软件
- 人工智能
- 计算机辅助设计
- 多媒体
- 软件测试
- 计算机硬件与维护
- 网站策划/UE
- 网页设计/UI
- 网吧管理