Introduction
Building a supercomputer containing millions of interconnected high-speed and high-power components is no trivial task. Before installing thousands of printed circuit board assemblies (PCBAs) to create the complete system, a thorough testing phase is essential to identify defects that could harm interconnected components and circuit boards as well as ensure functionality. Large, complex boards pose a test development challenge, but when they are destined for a liquid-cooled system, managing thermal requirements in the test development phase compounds the challenges. A leading High Performance Computer (HPC) manufacturer sought collaboration for developing test systems for their boards as part of a groundbreaking supercomputer project. Benchmark brought the necessary experience and expertise to support the development of a test strategy, developing the hardware and software required to deliver fully tested boards.
Benchmark’s customer designed a supercomputer leveraging a combination of the latest high-end CPU and GPU technology. In collaboration with another major computing technology leader, the project involved designing and developing a system to consolidate the capabilities of many computers into one cohesive unit. The two companies devised a modular setup, connecting cabinets filled with large PCBAs and components to build a singular, powerful computing system. Like many modern HPC systems, this one relied on liquid cooling to handle the significant thermal management requirements of the high-power components on multiple large, dense boards.”
Faced with the significant task of interlinking numerous components within such a vast system, a modular strategy was employed to simplify the process by breaking down the assembly and testing into manageable segments. The collaboration also introduced many other innovations, including unique high-performance interconnect technology and the development of platforms to facilitate the smooth integration of the system’s myriad of sub-assemblies. Additionally, a custom storage system was designed to manage the massive data volume expected to be processed.
The Challenge
In a system comprised of multiple water-cooled equipment racks housed within robust metal cabinets, each of which contains miles of networking cables connecting modules or blades to operate as a unified computational system, the failure of a single connection or component causes major setbacks in bringing the system online. Each component and connection must be tested, but standard test fixtures rely on air cooling, which would not be sufficient for these dense blades. The customer needed a unique test solution capable of validating thousands of electronic connections while maintaining fluid connections on each blade, ensuring error-free assembly to achieve the intended level of computational power.
The Solution
The need for rapid yet precise manufacturing tests for a high-performance computing project’s components required meticulously planned testing. To support this, Benchmark leveraged a fixture fabrication partner to assist in executing practical measurement solutions for complex test scenarios, including the sophisticated blades and PCBs destined for an innovative supercomputer. By leading the partnership and defining testing methodologies, Benchmark ensured the blades and their multiple daughter PCBAs were thoroughly exercised. This approach streamlined the process, resulting in a highly efficient test system capable of conducting numerous board-level measurements swiftly.
The customer’s test criteria included ensuring all circuit interconnections were intact, identifying any assembly flaws in the PCBAs (due to issues like soldering, material anomalies, and component placement tolerances). The software the Benchmark team developed was instrumental in facilitating these evaluations. By strategizing on measurement techniques together, a process was created to enable testing of thousands of computer blades within a manageable time limit. Benchmark’s test development engineers outlined a comprehensive and practical manufacturing test strategy, incorporating multiple testing stations.
Inspecting each blade and its boards encompassed three measurement types: in-circuit testing (ICT), functional-circuit testing (FCT), and fluid purge testing. The latter is critical for verifying the integrity of each blade’s cooling channels as well as ensuring they are free from any fluids that could cause damage under freezing conditions during shipping and/or storage prior to installation.
Board-level ICT stations are pivotal in assessing a PCB post-assembly, focusing on connections made by soldering. These tests investigate for any open or short circuits, verifying the integrity of connections, especially for components with multiple pins or ball grid arrays (BGAs). Employing a known voltage to one end of a circuit and measuring the output elsewhere helped detect shorts, essential for catching faults not easily seen otherwise. Specialized equipment ensured precise and dependable ICT results.
X-ray inspection was also employed in the manufacturing process to spot voids in solder or component misalignment.
FCT is the next challenge to ensure proper assembly and functionality. Due to the speed and power technology combination of the supercomputer, it was critical to utilize an FCT cooling solution while the boards were being functionally tested to ensure none of the high-power expensive components overheated while powered on and tested. This again required collaboration with our customer and Benchmark’s mechanical design team to develop and fabricate a unique FCT cooling strategy.
While standard testing apparatus like oscilloscopes and multimeters are used, the precision of test probes is vital, considering the type of solder connections and the high-speed nature of the blades and system. With PCB miniaturization, FCT often faces challenges due to smaller circuit attachments and component packages. The delicate nature of solder joints, especially with miniaturized components on soft circuit materials (like PTFE-based ones), necessitates minimal force application during testing. FCT fixtures were, therefore, designed considering component types to avoid any damage during testing phases.
A key material characteristic, the coefficient of thermal expansion (CTE), indicates how a material’s volume changes with temperature variations, like those experienced during soldering. Excessive expansion can compromise the reliability of connections, a critical aspect monitored during FCT for the supercomputer’s components.
To meet the testing volume required within a feasible period, Benchmark’s proprietary Process Feedback System (PFS) was instrumental within the shop floor control system (SFCS), managing multiple test stations, and streamlining measurements across various PCBs to reduce overall testing duration. The SFCS, overseen by skilled engineers, coordinates the workflow across different testing stages.
The Results
The collaborative effort between Benchmark and the company to create and refine the manufacturing testing process for the supercomputer blades yielded remarkable outcomes. The strategic integration of multiple tests including ICT, FCT, and fluid purge testing facilitated a streamlined workflow capable of efficiently assessing the integrity and performance of thousands of blades within the targeted time limit. This comprehensive testing strategy not only ensured the high quality and reliability of each blade but also significantly reduced the potential for delays in the production schedule. By implementing the Process Feedback System (PFS) within the shop floor control system (SFCS), the team could dynamically adjust testing protocols based on real-time data, further optimizing the testing process.
The ability to conduct simultaneous measurements on multiple PCBs drastically cut down total test times, enabling the ambitious goal of testing nearly 10,000 blades to be not just a possibility, but a reality. Moreover, the careful consideration of test fixture design which implemented cooling solutions and the employment of advanced testing technology, such as X-ray inspection for solder quality and precision probes for FCT, addressed the challenges posed by the ongoing miniaturization of the PCB components. These advanced strategies and considerations in the testing process helped in mitigating potential issues related to component packaging and circuit material behavior under various conditions, thereby maintaining the integrity of the blade’s connections and overall functionality.
The success of the testing strategy for the supercomputer blades represents a significant achievement in the field of high-performance computing manufacturing, setting a new standard for quality assurance and efficiency in the production of complex electronic systems. Through this collaborative endeavor, the project team demonstrated the value of innovative testing methodologies and the power of partnership in overcoming technical challenges.
About Benchmark
Benchmark provides comprehensive solutions across the entire product lifecycle, leading through its innovative technology and engineering design services, leveraging its optimized global supply chain and delivering world-class manufacturing services. The industries we serve include commercial aerospace, defense, advanced computing, next-generation telecommunications, complex industrials, medical, and semiconductor capital equipment.
