Published: May 13, 2025 by PerfGrid
Pulse Week 20
This week we talk about new hardware, improvements to Valkey Manager in hosting-panel.net and more
New Server Deployed
Last week we took delivery of a new server for our Grid Hosting plans. Just as with nlsh04 and nlsh05 as we have in production currently in Amsterdam, we're using Supermicro as our vendor, specifically the CloudDC A+ Server AS-1115CS-TNR.
The new server comes with a slight upgrade since we've moved from AMD EPYC Genoa 9254 to AMD EPYC Turin 9255 and from 4800MHz ECC Memory to 6000MHz ECC Memory further boosting the performance.
We continue to use Samsung PM9A3 NVMes due to their reliability and performance characteristics that suits our environment well.
We ran a Geekbench 6 benchmark on the system and observed a 23% improvement in single-core and a 35.5% improvement in multi-core workloads. While these results may not directly translate to real-world performance gains in a shared hosting environment, they do demonstrate generational improvements in the hardware.
We are still in the process of deploying our software to the new server. Once this is complete, we will begin end-to-end testing and hope to make it available to customers within the next two weeks, following extensive testing.

Interestingly, the current hardware market is experiencing a steady increase in hardware costs. Samsung a few years back decided to decrease the production of NAND chips to effectively drive up the pricing of NAND, as a result solid state technology has increased steadily with NVMes costing more than double than what they did just a few years back when we're looking at enterprise grade drives. Other costs within servers have also gone up over the years.
Such increases naturally impact the overall cost of servers. If we look at what we paid for our servers back in December 2023, the exact same model today cost 17% more than it did back then even if we were to use the AMD EPYC Genoa 9254 CPU.
We do hope that hardware prices will start to drop again in the future, as we've seen in the past.
Valkey Manager gets statistics
We've added statistics to Valkey Manager which will show the active memory usage in Valkey, the Cache Hit Ratio, operations per second and the traffic through Valkey as well as the uptime and number of keys stored in Valkey.
These statistics are collected on a 10 second interval through a custom collector and stored in ClickHouse for 30 days amounting to 259200 datapoints per Valkey instance.
One great benefit for example is to use the Cache Hit Ratio to determine whether the "Max Memory" configuration selected is high enough, since a low Cache Hit Ratio often means that there's a high rotation of the keys within Valkey, or obviously in the case of low traffic it may simply not be beneficial enough yet.

Our continuation with faulty hardware
About two months ago we had stability issues on the nlcp03 server. We even wrote about it in our Pulse Week 11 post.
After weeks of testing hardware, we found out it's not actually related to the NIC as we first expected, since we replaced the NIC in the system with a spare NIC, an older Intel X520-DA2. We also tested the original Mellanox NIC in another system, and we saw no issues with the card at all.
In fact, we began to see random crashes on the server quite frequently. Fortunately, this is a server we are not actively using in production at the moment.
While at the datacenter, we decided to troubleshoot by removing all DIMMs from the server, suspecting a faulty DIMM that the system had not detected. We started with two DIMMs and observed no crashes, but as soon as we added more, the crashes resumed.
We have therefore taken the server with us for further debugging, either to identify the faulty DIMM(s) or to reseat the CPU, as the issue may be resolved by reseating the CPU in its socket.
Due to the size of modern CPUs, any imbalance in pressure across the CPU can cause stability issues. This pressure can change over time, especially when moving equipment, as we did during the migration in December 2024.
