Category Archives: Tech Zone

GPU Server for HPC Cluster

How hard it is to build a server with four top-of-the-line GPU for an high-performance computing cluster? Harder than you might think.

When I started building the SCRP cluster back in 2020 summer, the GPU servers were provided by Asrock Rack. Everything except the GPUs were preassembled. This is the sensible thing to do in normal times.

Fast forward to 2021 summer, and times were not normal. The supply chain distribution and semiconductor shortage were in high gear. Pretty much every name-brand server manufacturer quoted us months-long lead time, if they were willing to deal with us at all. To get everything in for the new academic year, I constructed a series of servers with parts sourced from different parts of the world. It is actually not that hard to build servers—they are basically heavy-duty PC’s with all sorts of specialized parts—that is, unless you want a GPU server suitable for an HPC cluster.

So what is so special about GPU servers for HPC cluster?

  • Most server case have seven to eight PCI slots, but I needed at least nine slots (four dual-slot GPU + single slot Infiniband network card). There are maybe two manufacturers for such cases you can find from retail channel.
  • High-end GPU uses a lot of power. A single RTX3090 uses 350W, four means 1400W. Adding in CPU and other stuff and you are looking at 1800W minimum. A beefy power supply is definitely needed.
  • 1800W ATX power supply does exist, you say. The problem is, almost no servers use ATX power supply—they pretty much all use specialized CRPS power supply that gives you two power supplies in one small package. There are a lot of benefit to this, including redundancy and lower load per power supply. Guess how many 2000W CRPS power supply can you find from retail channel? ZERO. There is simply too much demand for these things from server manufacturers and too little from retail. I was fortunate enough to have one specially ordered on my behalf by a retail supplier, but it took a while to arrive.
  • Once you sorted out the parts, now comes assembly. Unless you have one of those highly-specialized Supermicro 11-slot motherboard—I am not sure if they even sell them in retail—your motherboard will have the width of seven PCIe slots. But you need nine! What do you do? Simple, you might think, all that is needed is a PCIe extension cable. Except you need one end of the cable to go under a GPU, and 99% of the cables you can buy will not be able to do that. I ended up having one custom-made. Yes, custom made. It’s the silver strip in the photo. Did I mention it is so fragile out of the factory, I ended up strengthening it with hot glue myself?

To conclude, if you think building your own PC is challenging, building a GPU server for an HPC cluster is probably three times the challenge. Another reason why you should not maintain your own infrastructure.

PCIe Gen 4 GPU does not play nice with Gen 3 extender board

Spent over an hour trying to figure out why some new GPUs were not working. The server is concern is a Asrock Rack 2U4G-EPYC-2T, which is a specialized server that allows four GPUs to be installed in a relatively small case. Google was not helpful because, understandably, this is a niche product produced only in small quantities.

What did not work:

  • -Attaching four Ampere GPUs (i.e. RTX 3000 series) in their intended positions in the case.

What work:

  • Attaching four Pascal GPUs (i.e. GTX 1000 series) in the intended positions.
  • Attaching only one Ampere GPU at the rear of the case.
  • Attaching four Ampere GPUs directly to the mainboard.

Took me a good hour to figure out that the issue was caused by the PCIe extender board. The three GPU positions at the front require the extender board, but the board was only for PCIe Gen 3. Normally, Gen 4 GPUs can negotiate with Gen 3 mainboards to communicate in PCIe Gen 3, but apparently they cannot do that through the extender board. Once the issue had been identified, the solution was actually very straightforward—manually setting the PCIe lanes to Gen 3 solves everything.

Yet another reason why maintaining your own computing infrastructure is not for the faint hearted.

“B-F-G-P-U”

We will be running tests and benchmarks here at CUHK SCRP over the next few days. Users should be able to access the new RTX 3090 through Slurm after the scheduled maintenance next week.

SCRP: 兩個月建成的「超級電腦」

這兩天學系高年級的同學應該都收到我發出有關學系全新網上系統的電郵。新系統由多台伺服器一體運作,配以多種統計軟件讓同學網上使用。這樣的系統一般被稱為高效能運算集群(High Performance Computing Cluster),不過大家較熟悉的名字可能是其俗稱「超級電腦」(Supercomputer)。建立這個系統的原由是因疫情關係學系的電腦室全關,如何讓幾百名學生在家用到統計軟件就成了必須解決的問題。

六月中在學系的支持下,東找西找籌集了二十萬的預算,以兩個月時間建起了SCRP這個新系統。 20萬對一個學系來說是不少的,但在高效能運算很多時候一台機都未必買得到。為了節省預算,SCRP用了相當多的二手零件。尤幸高性能運算零件的二手市場供過於求,不難以五分一甚至十分一的價錢找到合用的零件。再加上借調學系較為早期的伺服器,最終在8月中完成整套系統。

有了這個新集群,中大經濟系很大機會會是第一個經濟系要求所有同學都學用高性能運算系統(很邪惡的老師)。 雖然各間大學都有自己的高性能運算集群, 但通常都只供研究人員使用,在計算機科學系以外甚少會讓本科生都可以使用。其實高效能運算集群的基本使用並不是十分複雜,像R和Python甚至直接用瀏覽器就可以了。 雖然老是被老師逼學新事物有點可憐,但還是那句,今時今日學多點數據分析總有好處。

SCRP網頁及使用指南:http://scrp.econ.cuhk.edu.hk

是否必須購買有抗水認證的口罩?

港人防疫意識高漲,很多人對口罩標準亦很研究。網上時常見到的一個論點是防疫必須購買抗水認證的口罩—主要是ASTM 1862—否則無法保證口罩可以抵擋飛沫。這看法在平時無可厚非,但在全球口罩缺貨下,我們不得不考慮無認證的口罩是否真的在這方面不足以用作保護。

本著科學精神同自身安全,周博士在家中自行測試口罩抗水性能。結論係點?基本上大部份口罩都有足夠抗水性,包括工業用N95及厚身的翻版口罩。要注意我用的測試方法是以針筒近距離射出顏色水,水壓比飛沫強好多,現實中除非你是醫護人員否則應該不會遇到這麼危險的情況。

總括:只要口罩未用過基本上沒問題,反而重用嚴重染污的口罩就真的萬萬不可。

Do you need a mask with water resistance rating for disease protection? The answer is no, most masks, as long as they are not soaked, have enough water resistance even if they have no rating.

PDF of test results.

八月底為出席博士班同學的婚禮去了美國三藩市灣區一趟,順道和當地的一些好友敘舊。其中一人為人工智能研究先驅,現時在一家國際知名科技公司任管理層,領導一個四十多人的人工智能研究團隊。先來點求學忠告:我因著讀博士的機緣巧合才有幸認識到這位朋友,認識他的時候也大家不過是board game友,從沒想過從他身上學到那麼多。所以我常常跟打算出國留學或交流的同學說,出外比讀書更重要的是體驗,千萬別把自己困在熟悉的圈子裡!

參觀完他公司簇新像太空船般的總部大樓,在閒談間亦討論到中美在人工智能方面的競爭。我當時問他,歐美人工智能專材工資超高是街知巷聞的事,中國以相對低廉的工資,透過人力優勢超越美國有何困難?這位朋友因為曾經在中國一家首屈一指的科網公司待過,對此亦有一番見地。他說中國的科技公司當然有這樣做—他估計現時的工資差距約在十倍左右,聘用的人手也自然有相應差距。問題是中國公司的管治模式由上而下,員工經常揣摩上意,以至大量時間消耗在無實際意義的開發上。

這就到我這次想談的問題—中國的發展其實還未能脫離人力密集模式。中國去年總共有154萬個專利申請,為美國的兩倍半有多,數目全世界第一(註1)。但若我們看人均數目,則美國每一百萬人有1800個申請,而中國只有1100個。一個更貼身的例子:常有人以深圳國民生產總值遠遠超越香港來指出前者的優越性,但其實深圳的人均GDP去年仍未及香港六成。

有人會說人多也是種優勢啊?在某一特定時空來說是的,但人會老,今天的優勢可能就是明天的負擔。中國的人均GDP要追上美國還需相當時間,但其人均年齡預測明年就要超越後者了。根據國家統計局的數據,中國15-64歲的人口在2013已經見頂,並以每年數百萬人的速度減少中(註2)。人口老化問題很多國家都要面對,但正如《經濟學人》指出,中國的問題是未富先老(註3)。工作適齡人口減少隨之而來的是撫養比上升,而科研是應對這個問題的關鍵。

科研有著經濟中所謂「公共品」(public good)的特性,成果全民共享的成本甚低。所以即使一個國家的科研效率低,只要人夠多,只靠當中的頂尖人材甚至機緣巧合就總有研發成果,國家的發展亦可以很迅速。反過來說,當國家的工作人口下降,結構性的缺點就會漸漸顯露出來。在工作人口下降的大趨勢下,國家本應拆牆鬆綁以激發活力,現實卻是國家對人民及企業的控制變本加厲。一個沒有自由的國度,又何來創意,何來長遠發展?

註1:https://www.wipo.int/pressroom/en/articles/2019/article_0012.html
註2:http://data.stats.gov.cn/easyquery.htm?cn=C01&zb=A0301&sj=2018
註3:https://www.economist.com/finance-and-economics/2019/10/31/chinas-median-age-will-soon-overtake-americas