Sunteți pe pagina 1din 21

Stan Posey, CAE Industry Development

NVIDIA, Santa Clara, CA, USA

VIDIA and HPC Evolution of GPUs


Public,basedinSantaClara,CA|~$4Brevenue|~5,500employees
Foundedin1999withprimarybusinessinsemiconductorindustry
Productsforgraphicsinworkstations,notebooks,mobiledevices,etc.
BeganR&DofGPUsforHPCin2004,releasedfirstTeslaandCUDAin2007

DevelopmentofGPUsasacoprocessingacceleratorforx86CPUs

HPCEvolutionofGPUs
2004: Began strategic investments in GPU as HPC co-processor
2006: G80 first GPU with built-in compute features, 128 cores; CUDA SDK Beta
2007: Tesla 8-series based on G80, 128 cores CUDA 1.0, 1.1
2008: Tesla 10-series based on GT 200, 240 cores CUDA 2.0, 2.3
2009: Tesla 20-series, code named Fermi up to 512 cores CUDA SDK 3.0, 3.2
ANSYS 2011 Regional Conferences | 25 Aug 2011 | Palm Beach Gardens, FL

3 Years With
3 Generations

Research Example: CFD for Gas Turbine Engines


Siemens UK 3-Stage Compressor Rig
160MGridPoints
5Revolutions
23KTimeSteps
24hoursforeach
revolutionon32GPUs

UniversityofCambridgeDARWINCluster
CUDACenterofExcellenceSince2008
GPUsubcluster:
DellT5500servers,
32dualsocketCPUs
TeslaS1070GPUs,
4GPUspersocket
fortotal128GPUs

Turbostream SimulationSpeedup19x

~19x

urces: www.many-core.group.cam.ac.uk/ukgpucc2/talks/Brandvik.pdf | www.turbostream-cfd.com/ |


ANSYS 2011 Regional Conferences | 25 Aug 2011 | Palm Beach Gardens, FL

www.hpc.cam.ac.uk/services/darwin.html

VIDIA and ANSYS Collaboration Focus


NVIDIA
GPUStatus

Structural
Mechanics

Fluid
Dynamics

Electro
magnetics
ANSYSNexxim
(SignalIntegrity)

Available
Today

ANSYSMechanical13
SMP,SingleGPU

Updates
for2011

ANSYSMechanical14
DMP,ImprovedPCG

ANSYSCFD14
RadiationHT(beta)

Product
Evaluation

ANSYSMechanical15
MultiGPU,Multinode

ANSYSCFD15
Solver,othermodels
ANSYSHFSS

Research
Evaluation

ANSYSMaxwell

NVIDIAProvidesBusinessandEngineeringInvestmentsinANSYSTechnologyDevelopments
ANSYS 2011 Regional Conferences | 25 Aug 2011 | Palm Beach Gardens, FL

ow ANSYS Software Works With GPUs


ANSYS computes the heavy workloads of matrix solvers on
the GPU and other routines on the CPU
ANSYS Mechanical GPU acceleration is user-transparent
Jobs launch and complete without additional user steps
1

1. ANSYS job launched on CPU


2. Solver operations sent to GPU
2

3. GPU sends results back to CPU


4. ANSYS job completes on CPU

3
ANSYS 2011 Regional Conferences | 25 Aug 2011 | Palm Beach Gardens, FL

mportant Considerations for ANSYS and GPUs


CoreANSYSfocusisondirectanditerativelinearsolvers
Others(models,mat.assembly)movetoGPUsinprogressivestages

MostANSYSsoftwareemploysadomainparallelmethod
GPUcomputingfitsthismethod,preservesDANSYSinvestments
ANSYS13focuswasSMPsolvers;ANSYS14focusisDANSYSsolvers

ANSYSsoftwareisparallelandscaleswellformulticoreCPUs
DirectsolversuseaschemeofcomputationsonbothGPUandCPU
IterativesolvershavecomputationsonGPU,matrixassemblyonCPU
InvestigationsincludeGPUperformanceagainstmulticoreCPUonly
ANSYS 2011 Regional Conferences | 25 Aug 2011 | Palm Beach Gardens, FL

ANSYS Presentation at NVIDIA GTC 2010


Sep 20 23, 2010 San Jose Convention Center, San Jose, California, USA

AcceleratingSystemLevelSignalIntegritySimulationwithGPU
Dr.EkanathanPalamadai,ANSYS

Nexxim 13.0 Convolution


Results for Tesla C2050:
Intel Nehalem 8 core
CPU, OpenMP: 108 H
Lower
is
better
Speedup combines
GPU and other SW
changes

NVIDIA Tesla C2050


GPU, OpenMP: 4 H
Single Precision ~27x
Double Precision ~13x

ANSYS 2011 Regional Conferences | 25 Aug 2011 | Palm Beach Gardens, FL

ANSYS CFD 14.0 to Offer (Beta) GPU Capability


NOTE: Growing CPU
time of view-factor
computations inhibit
proper inclusion of
radiation HT effects

NOTE: GPU time


remains low even
as view-factor
computations
grow very large

ANSYSCFDpreliminaryresultsof
radiationheattransferviewfactor
computationonGPUsvs.CPUs
RadiationHTApplications:
Underhood cooling
CabincomfortHVAC
Furnacesimulations
Solarloadsonbuildings
Combustorinturbine
Electronicspassivecooling
OtherANSYSCFDEvaluations:
Models(e.g.dispersephase)
Implicitequationsolvers

ANSYS 2011 Regional Conferences | 25 Aug 2011 | Palm Beach Gardens, FL

NSYS Announcement of NVIDIA CUDA Support

"ThisinitialdevelopmentforGPUcomputingdemonstratesourfocusonevolvingANSYSsoftwaretotakeadvantage
ofimportanttechnologytrendsinhighperformancecomputing."saidDipankarChoudhury,vicepresidentof
corporateproductstrategyandplanningatANSYS."Weworktoachieveoptimizedsoftwareperformance,acrossthe
fullspectrumofHPCtechnologies,sothatourcustomersgetmaximumvaluefromtheirinvestmentinHPC.Here,our
technicalcollaborationwithNVIDIAhasresultedinasignificantbenefitforourmutualcustomers."

ANSYS 2011 Regional Conferences | 25 Aug 2011 | Palm Beach Gardens, FL

etails of ANSYS Mechanical for NVIDIA GPUs


ANSYSMechanical13:CollaborationonSMPdirect sparseand
PCG/JCGiterative solvers CUDA3.2supportin13.0SP2
InitialreleaseforbothLinuxandWindows64bit,andsingle
GPUperjob multiGPUunderevaluationforfuturerelease:
Modellimitsfordirect dependonlargestfrontsizes:GPUsgoodfor
~1MDOFto~8MDOFfor6GBTeslaC2075orQuadro 6000
Modellimitsforiterative dependonGPUmemory:GPUsgoodfor
~1MDOFto~5MDOFfor6GBTeslaC2075orQuadro 6000

ANSYSMechanical14:CollaborationonDMPsolvers Q42011
ANSYS 2011 Regional Conferences | 25 Aug 2011 | Palm Beach Gardens, FL

10

NSYS Mechanical 13 Results of GPU Acceleration


NOTE: Results of ANSYS Mechanical for Tesla C2050 and Intel Xeon 5560

GPUSolver
KernelSpeedups

FromNAFEMSWorld
CongressMay2011
Boston,MA,USA
AccelerateFEA
SimulationswithaGPU
byJeffBeisheim,ANSYS

GPUOverall
SimulationSpeedups

ANSYS 2011 Regional Conferences | 25 Aug 2011 | Palm Beach Gardens, FL

SystemConfiguration:
Xeon5560,2.8GHz
2sockets,8cores
32GBmemory
WinXPSP264bit
TeslaC2050GPU

11

NSYS Mechanical 14 Preview on GPU Workstation


NOTE: Results Based on ANSYS Mechanical 14.0 Preview 3 DMP Solver Aug 2011

ANSYS Mechanical Times in Seconds

3000

Xeon56702.93GHzWestmere(DualSocket)

Lower
is
better

Xeon56702.93GHzWestmere+TeslaC2075
ResultsfromHPZ800Workstation,2xXeonX56702.93GHz
48GBmemory,CentOS 5.4x64; TeslaC2075,CUDA4.0.17

2000

1848

1000

4.2x

1192
846

2.7x

564

2.1x

516

1.9x

342

314

273

270

2Core

4Core

6Core

8Core

1Core

V13sp5Model

NOTE: Add a Tesla C2075


to use with 6 cores: now
46% faster than 12, with
6 available for other tasks

3.5x

444

AVAILABLE
Q42011

1Socket

399

12Core

Turbinegeometry
2,100KDOF
SOLID187FEs
Static,nonlinear
Oneloadstep
Directsparse

2Socket

ANSYS 2011 Regional Conferences | 25 Aug 2011 | Palm Beach Gardens, FL

12

How ANSYS is Licensed for NVIDIA GPUs


ANSYS Base License:

Unlocks up to 2 CPU Cores

ANSYS HPC Pack:

Unlocks up to 8 CPU Cores


Unlocks GPU Acceleration

* Academic customers: GPU acceleration is bundled


ANSYS 2011 Regional Conferences | 25 Aug 2011 | Palm Beach Gardens, FL

13

NSYS Cost Performance Gain > 4X vs. Base License


NOTE: Based on ANSYS Mechanical 14.0 Preview 3 DMP Solver Aug 2011 and Model V13sp-5

Factors Gain Over Base License Results

ResultsfromHPZ800Workstation,2xXeonX56702.93GHz
48GBmemory,CentOS 5.4x64; TeslaC2075,CUDA4.0.17

CPUSpeedup
GPUSpeedup
SolutionCost

SolutionCostBasis
ANSYSbaselicense
ANSYSHPCPack
Workstation
TeslaC2075

PerformanceBasis
V13sp5Model:
2,100KDOF
SOLID187FEs
Staticnonlinear
Oneloadstep
Directsparse

1
1.0 1.0
0

BaseLicense
2Core
ANSYS 2011 Regional Conferences | 25 Aug 2011 | Palm Beach Gardens, FL

14

NSYS Cost Performance Gain > 4X vs. Base License


NOTE: Based on ANSYS Mechanical 14.0 Preview 3 DMP Solver Aug 2011 and Model V13sp-5

Factors Gain Over Base License Results

5
4
3

ResultsfromHPZ800Workstation,2xXeonX56702.93GHz
48GBmemory,CentOS 5.4x64; TeslaC2075,CUDA4.0.17

ANSYSbaselicense
ANSYSHPCPack
Workstation
TeslaC2075

CPUSpeedup
GPUSpeedup
SolutionCost

2.3

2.1

1
1.0 1.0

1.23

PerformanceBasis
1.23

BaseLicense
2Core

SolutionCostBasis

ANSYSHPC
Pack 6Cores

V13sp5Model:
2,100KDOF
SOLID187FEs
Staticnonlinear
Oneloadstep
Directsparse

ANSYSHPC
Pack 8Cores

ANSYS 2011 Regional Conferences | 25 Aug 2011 | Palm Beach Gardens, FL

15

NSYS Cost Performance Gain > 4X vs. Base License


NOTE: Based on ANSYS Mechanical 14.0 Preview 3 DMP Solver Aug 2011 and Model V13sp-5

Factors Gain Over Base License Results

5
4
3

ResultsfromHPZ800Workstation,2xXeonX56702.93GHz
48GBmemory,CentOS 5.4x64; TeslaC2075,CUDA4.0.17

4.4

CPUSpeedup
GPUSpeedup
SolutionCost

1.0 1.0

ANSYSbaselicense
ANSYSHPCPack
Workstation
TeslaC2075

3.8

2.3

2.1

SolutionCostBasis

1.23

PerformanceBasis
1.23

1.28

1.28

BaseLicense
2Core

ANSYSHPC
Pack 6Cores

ANSYSHPC
Pack 8Cores

V13sp5Model:
2,100KDOF
SOLID187FEs
Staticnonlinear
Oneloadstep
Directsparse

ANSYSHPCPack ANSYSHPCPack
4Cores+GPU
6Cores+GPU

ANSYS 2011 Regional Conferences | 25 Aug 2011 | Palm Beach Gardens, FL

16

VIDIA Use of ANSYS Software for Product Design


ANSYSIcepak activeandpassivecoolingofICpackages

ANSYSMechanical largedeflectionbendingofPCBs

ANSYSMechanical comfortandfitof3Demitterglasses

ANSYSMechanical shock&vib ofsolderballassemblies


ANSYS 2011 Regional Conferences | 25 Aug 2011 | Palm Beach Gardens, FL

17

VIDIA HPC Case Study: Performance Gain of 77x


ANSYSMechanicalsimulationsbyNVIDIAfordesignof3Demitterglasses
Simulationforpredictionofcomfort,fit,andhandling
StudyoptimizedonCPUplatformbeforeapplyingGPU
Onceneglectedmodelparameterizationnowpractical

ANSYS 2011 Regional Conferences | 25 Aug 2011 | Palm Beach Gardens, FL

18

ecommended System Configurations


Workstations with
Tesla GPUs

Servers with Tesla GPUs

Workstations

Servers

Existing System

Existing System

Tesla C2050 (3 GB)


Tesla C2075 (6 GB)

Tesla S2050 (12 GB or 3 GB/GPU)

New System Purchase

Total 6-8 CPU cores


Total 48 GBs of CPU memory
Disk with minimum 500 GB
Tesla C2075
+ Quadro 2000 for pre/post
-- OR - Quadro 6000 (6GB)

New System Purchase

Total 4 CPUs, 6-8 CPU cores each


Total 4 x16 PCIe (one for each GPU)
Total 96 to128 GBs of CPU memory
Disk with minimum 2000 GB (scratch)
Tesla M2070 or Tesla M2090

ANSYS 2011 Regional Conferences | 25 Aug 2011 | Palm Beach Gardens, FL

19

Summary and Next Steps


ANSYSSoftwaresupportsNVIDIAGPUsforComputation
ANSYS13.0sinceNov2010;NewfeaturescominginANSYS14.0

JointCollaborationonANSYS13.0isonlythebeginning
CollaborationongoinginalldisciplinesofCSM,CFDandCEM

LearnmoreaboutANSYSandNVIDIAGPUsolution
Moreat:www.nvidia.com/object/teslaansysaccelerations.html
WanttotryANSYSonNVIDIAGPUs?Contactcae@nvidia.com
ANSYS 2011 Regional Conferences | 25 Aug 2011 | Palm Beach Gardens, FL

20

Thank You, Questions ?


StanPosey|CAEIndustryDevelopment|sposey@nvidia.com
NVIDIA,SantaClara,CA,USA

ANSYS 2011 Regional Conferences | 25 Aug 2011 | Palm Beach Gardens, FL

21

S-ar putea să vă placă și