Documente Academic
Documente Profesional
Documente Cultură
June 2011
WhIte pAper
June 2011
WhIte pAper
J2Methods
June 2011
WhIte pAper
COntentS
the Supercomputing Buzz .......................................... 2 Microsoft Clusters Are here ........................................ 2 the Status Quo ....................................................................... 3 three Big Questions .......................................................... 3 Workstation to Cluster transition.......................... 3 home Computer network ............................................ 4 CoWs as a Windows hpC Feature ........................... 5 easy CoMSoL Installation ............................................ 5 taking a test Drive .............................................................. 6 Budget-Cluster performance..................................... 7 Microsoft Field Support Cluster .............................. 8 embarrassingly parallel Computations ............ 9 Field Cluster performance............................................ 9 taming any problem Size...........................................11 Going Large Scale with Azure ................................12 the Verdict ..............................................................................12
2011, COMSOL. All rights reserved. This White Paper is published by COMSOL, Inc. and its associated companies. COMSOL and COMSOL Multiphysics are registered trademarks of COMSOL AB. Capture the Concept, COMSOL Desktop, and LiveLink are trademarks of COMSOL AB. Other product or brand names are trademarks or registered trademarks of their respective holders.
June 2011
WhIte pAper
environments where Windows administrative support is much more readily available and arguably cheaper. My perception of the single workstation world is similar. Where die-hard numerical experts typically gravitate to Linux, the broad mass of engineering practitioners does favor the Windows platform due to their generally greater familiarity with its GUI-driven operation. With COMSOL Multiphysics running on all relevant operating systems available today, the choice appears to be simply one of user preference. Personally, I tend to think Linux in single workstation settings. However, when reviewing Windows HPC Server, my mind experienced a paradigm shift in realizing that it is exactly this GUI familiarity with Windows OS that makes this product highly relevant in the cluster world and establishes a distinct competitive advantage. I followed this notion and decided to explore whether or not the combination of COMSOL Multiphysics and Windows HPC Server would make practical sense.
2. Do parametric sweeps scale favorably, providing a distinct advantage over single workstations or servers? 3. Can contiguous memory blocks, originally required to solve very large problems, be segmented across cluster nodes?
June 2011
WhIte pAper
June 2011
WhIte pAper
via the specification of *.iso operating system images. However, I did not have any computers to spare that could assume the roles of dedicated compute nodes. I was looking for another way and found it in one of the latest features of Windows HPC Server 2008 R2 Enterprise Edition.
Figure 3: I was wowed by the variety and ease of Windows HPC node deployment methods. For instance, you can make any networked workstation deployable within a minute by installing the tiny Windows HPC Pack. When adding a node to your cluster, you are asked to make a simple choice as shown.
June 2011
WhIte pAper
Figure 5: In this COMSOL Multiphysics GUI view, the Vented Loudspeaker Enclosure model from the built-in Model Library is shown after the Cluster Computing node was added. This node establishes the connection to the desired compute cluster and can be added to any model. It is a nifty design that makes switching gears between workstation and cluster computing very convenient.
installation on the head node and sharing it via UNC path, i.e. \\HEADNODE\comsol42, the installation was as straight-forward as on a standalone workstation. While I chose a minimalistic setup and neglected all performance-enhancing recommendations such as role separation and parallel subnets to handle communication and application data, I was taken aback by the flexibility of this software system and its ability for reconfiguration on the fly. In this context, it should be noted that the HPCS2008 Pack comes free with HPC Server and enables cluster access from any domain workstation. This option would be typical in fast LAN environments and preferable if end users require interaction with COMSOL Multiphysics high end graphics capabilities for modeling or report generation purposes.
user choice of the appropriate physics in an intuitive graphical user interface. In this case, the built-in acoustic-structure interaction formulation describes how an acoustic wave communicates physically with a structure which is what a loudspeaker membrane and its surrounding air pressure field do. COMSOL Multiphysics evaluates such formulation on each of the tetrahedral subdivisions shown in the computational mesh of Figure 6 and finds a piecewise continuous solution that spans the entire domain via the finite element method. Within each element the solution is continuous and characterized by polynomial coefficients which represent the unknown variables or degrees of freedom (DOF). The DOF grow with increasing mesh density a fact we will later use to increase problem size. Among the infinitely many ways to illustrate the results of this computation, one could present a slice plot of the sound pressure field illustrated in Figure 7 and the mechanical displacement field of the speaker membrane in Figure 8.
June 2011
WhIte pAper
Figure 7: Illustration of the qualitative sound pressure level field sliced along the geometrys plane of symmetry
Figure 8: Illustration of the qualitative displacement field of the moving speaker components.
Budget-Cluster performance
When used one at a time, headnode and WoRK eRnode carried out baseline sweeps of 32 frequencies at 135,540 DOF in 2,711 and 1,420 seconds, respectively. When headnode was instructed to utilize WoRKeRnode as a workstation node in Figure 9, the same computation took 1,729 seconds.
While this was faster than what headnode could accomplish by itself, the low-budget cluster was slower than WoRKeRnode. This probably amounts to cluster network traffic that WoRKeRnode does not encounter by itself, my disregard for performance recommendations, and my choice for the least desirable Topology 5. After all, this low-budget cluster was not intended to perform but to verify ease of configuration and use in the context of humble hardware resources. Looking back, the configuration of this low-budget cluster took less than a day. And, now that I know what to do, I could probably do it again within one morning while comfortably sipping a cup of coffee. To reach greater performance would mean investment in additional computing and networking hardware. And, this is what we did in the old days.
June 2011
WhIte pAper
Figure 9: View of the HPC Cluster Manager during low-budget cluster testing; note that the request for one node fires up one node in the Heat Map.
Figure 10: Network flow chart of Microsoft EEC Field Support Cluster #2; note how compute nodes are isolated on separate private and application networks to represent Network Topology 3 (see also Figure 11)
June 2011
WhIte pAper
The most trivial need for parallel computing arises when the goal is to carry out many similar computations. Think of the thousands of customers of an investment bank whose portfolio performance needs to be predicted regularly based on an ever-changing investment tactics. Since investment decisions are time-sensitive, it is easy to see that the edge goes to those brokers who can evaluate client portfolios the fastest. Instead of figuring out one client at a time, the idea is to compute all portfolio predictions at once, i.e. in parallel. You can take this further and even fan out the computations for each individual stock. The happy medium Figure 11: Microsoft Network Topology 3 will be anywhere between a feasible hardware price tag and ROI. I accessed this Microsoft cluster by sequentially With Microsoft Excel at the forefront of computations VPNing into a Microsoft gateway machine and the head in many industries, it comes as no surprise that Winnode \\node000. While we expected this Microsofts dows HPC Server support for parallel Microsoft Excel VPN service to be fast and reliable, I will admit that I have was largely driven by the financial industry. never seen anything faster. Such VPN connections are The analogous engineering problem is called favorably light on WAN traffic and remarkable in their par ametric and exemplified in the vented loudefficiency. However, there is a trade off in graphics perspeaker enclosure of the previous section. The formance which is inferior to running the cluster from a parameter investigated in this case is the excitalocal workstation as discussed earlier. tion frequency of the speaker which affects both the membrane deformation and the surrounding air pressure. Unlike computing one frequency at a time Field Cluster performance as done earlier, we will utilize Windows HPC Server to solve as many parameters as possible at any given The configuration of the cluster computing node to moment in time. communicate with the Field Cluster is analogous to It should be noted that the following computations that of the low-budget cluster in Figure 9. However, were carried out ad hoc and in the absence of a highly now we have the ability to request 16 nodes. controlled benchmarking environment. This choice Shortly after invoking the Solve command in was quite deliberate to reflect realistic working condiCOMSOL, the heat map in the cluster manager in tions of the typical engineer who has to produce reFigure 12 lights up and shows all 16 compute nodes at sults regardless of circumstance. But then, I was being nearly full CPU capacity.
spoiled with the Microsoft cluster which was configured using the optimal cluster network Topology 3 as detected by the cluster manager. As Figure 11 shows, traffic from the enterprise network is routed through the head node and cluster traffic confined to its dedicated private and application networks.
June 2011
WhIte pAper
WoRKeRnode, the same figure indicates that this Microsoft test cluster achieved a speedup of a factor of 6 down to 200 seconds when using all 16 nodes Of course, there are many dependencies such as number of parameters and problems size. To get a feeling for their significance, I divided the minimum and maximum element size requirements by 2 which increased the DOF from 135,540 to 727,788 and unleashed economies of scale of our cluster solution. With all 16 nodes engaged, the maximum speedup jumped from less than 6x for 135,540 DOF to more than 11x for 727,788 DOF as presented in Figures 13 and 14, respectively. Given that engineering computations are routinely measured in days or weeks, an imFigure 12: View of the HPC Cluster Manager during Field Cluster testing; proved turnaround of a factor of 11 is commernote that the request for 16 nodes fires up 16 nodes in the Heat Map. cially viable. When running this last set, I noticed that the consumed amount of memory ranged around 15 GB With this configuration, I was now able to measure which made me curious whether or not this larger probcomputation time with respect to number of compute lem would still run on the low-budget cluster. It did not nodes assigned. which I interpreted as an out-of-memory issue and a At zero compute nodes, head node node000 did perfect entry point for Big Question 3 about taming proball the work and finished in about 1,000 seconds or lem size via memory segregation or decomposition. 18 minutes as shown in Figure 13. Already faster than
Figure 13: Field Cluster performance for embarrassingly parallel computations using a finite element model with 135,540 DOF. At 16 compute nodes, the speedup nearly reaches 6x.
Figure 14: Field Cluster performance for embarrassingly parallel computations using a finite element model with 727,788 DOF. At 16 compute nodes, the speedup exceeds 11x.
10
June 2011
WhIte pAper
11
June 2011
WhIte pAper
Figure 18: Replacement of On-Premises Compute Cluster with Windows Azure Data Center 1
ness perspective is, according to Chappell1, that this allows tilting HPC costs away from capital expense and toward operating expense, something thats attractive to many organizations.
the Verdict
Playing guinea pig as an engineer with elementary IT skills, I was able to understand the available network topologies and configure the low-budget cluster. In fact, the experience was quite enjoyable. A welcome surprise was Windows HPC Servers configuration flexibility and viability in very small networks like my own. Workstation node integration on the fly enables standard business computers as compute nodes and the temporary metamorphosis of entire business networks into COWs that play supercomputer on nights and weekends. While the concept is neither very complicated nor new, Windows HPC Server is the first software system that has pulled this vision together feasibly for the main stream. Out of this world is the ability to manage these changes centrally via one configuration manager without any additional hardware and physical configuration requirements. Exploratory speedup factors of 6x and 11x in the context of embarrassingly parallel COMSOL Multi-
Figure 17: Augmentation of On-Premises Compute Cluster with Windows Azure Data Center 1
12
June 2011
WhIte pAper
physics computations provide a powerful business justification for Windows HPC Server. The ability to divide and conquer by distributing the memory required of any problem size allows us to draw conclusions to problems we cant even fathom today. In addition to integrating business networks with traditional HPC Clusters, Windows Azure expands the flexible configuration concept to the domain of incredibly fast growing cloud computing services. The blend of all three provides us with a powerful tactical toolset that enables you to conquer todays largest and toughest technical challenges. If you have been thinking about a COMSOL Cluster solution, there is no time to waste. COMSOL Inc. has introduced an extremely generous cluster licensing scheme which consumes only one floating network license (FNL) key per cluster. In other words, if you intend to run ten thousand nodes in parallel, you will only need one FNL key.
High-performance computing has entered a new era. The enormous scale and low cost of cloud computing resources is sure to change how and where HPC applications are run. Ignoring this change isnt an option. 1 n
references
1
Windows HPC Server and Windows Azure HighPerformance Computing in the Cloud, David Chappell, September 2010, Sponsored by Microsoft Corporation http://www.microsoft.com/windowsazure/Whitepapers/ HPCServerAndAzure/default.aspx Windows HPC Server 2008 R2 Suite Technical Resources http://www.microsoft.com/hpc/en/us/technical-resources/ overview.aspx Windows HPC Server 2008 R2 Suite System Requirements http://www.microsoft.com/hpc/en/us/product/systemrequirements.aspx COMSOL Multiphysics 4.2 Product Documentation
13
www.comsol.com
COMSOL, Inc. 1 New England Executive Park Suite 350 Burlington, MA 01803 U. S. A. Tel: +1-781-273-3322 Fax: +1-781-273-6603