Let’s first consider the speed of light (SOL) in a fiber cable. The number provided by the Qlogic crew for the webinar was 5 ηs (nanoseconds) to travel one meter in a fiber cable (It takes light 3.3 ηs to travel one meter in a vacuum). How can we translate that into a cluster diameter? Latency is measured in seconds and the SOL is measured in meters per second. Here is one way. First we have to define some terms:
LT is the total end to end latency
Lnode is the latency of the node (getting the data on/off the wire)
Lhop is the latency of the switching chips
Nswitch is the number of switch chips.
Lcable is latency of the cable, which is a function of length
A formula may be written for the total latency as follows;
|(1)||LT = (Lnode + Lswitch*Nhop + Lcable)|
If we take equation 1 and solve for Lcable, then divide the right hand side by 5 meters/ηs we get what I call the core-diameter:
|(2)||dcore =||LT - (Lnode + Lswitch*Nhop) |
The core-diameter is the maximum diameter of a cluster in meters. Let’s use some simple numbers. Suppose I need 2 μs (microseconds) latency for my application to run well (this is LT) and my nodes contribute 1 μs and I use a total of 6 switch chips with a latency of 140 ηs (nanoseconds). I get a core diameter of 32 meters. This diameter translates to a sphere of 17 thousand cubic meters. If we take an average 1U server and assume it’s volume is 0.011 cubic meters, then we could fit about 1.6 million servers in our core diameter. In practical terms, the the real amount is probably half allowing for human access, cooling, racks etc. So we are at about 780 thousand servers. If we assume 8 cores per server, then we come to a grand total of 6.2 million cores. If we run the numbers with LT of 3 μs the number explodes to almost 600 million servers and we can see why cable distance has not been an issue.