Enterprise applications often involve mission-critical operations that require high-performance workstation computers or servers to operate reliably 24/7. Therefore, enterprises install ECC RAM as their computer memory to ensure mission-critical applications can run continuously without any failure that can cause detrimental damage.
What is ECC Memory?
ECC Memory, or error-correcting code memory, is a type of RAM (random access memory) that utilizes an error correction code to detect and correct potential data corruption on the RAM. In contrast to non-ECC RAM, which can only detect common memory errors, ECC RAM can immediately detect and fix memory errors before they cause data corruption or event systems crashes. This is why ECC memory is utilized in numerous enterprise applications, especially for mission-critical applications.
What Causes Memory Data Corruption in RAM?
The smallest parts inside a volatile flash memory like RAM are called cells. The memory cell is an electronic circuit that stores an electronic charge to produce one bit of binary information representing either 1 or 0. The value of multiple memory cells is turned into a binary sequence where it gets translated into data on the computer. Each sequence consisting of multiple bits of 1s and 0s has its own unique value that will translate into data. For instance, the binary sequence of 1001011 represents the number 75.
The problem of data corruption occurs when there is an inaccuracy in one of these memory cells, suddenly changing their state from 0 to 1 or vice versa within a byte of 8 bits of data. This misrepresentation of a bit in a memory cell is known as single-bit errors. A memory flip from a single-bit error can be harmless to the computer but can also be detrimental that might cause the system to run the wrong code or even system shutdown. Here is a quick example of single-bit errors and how they can be subtle or damaging.
Let’s assume the correct information of the memory is supposed to be the number 75 with a binary sequence of 1001011.
A single memory flip from 1001011 (75) to 1001010 represents the number 74, which is still pretty close to 75 and pretty harmless to some applications.
However, a single memory flip from 1001011 (75) to 1101011 represents the number 107, which is pretty far away from 75 and can be harmful to some applications.
What Can Trigger Single-Bit Errors?
There are two types of single-bit memory errors, hard and soft errors. Unfortunately, some of these triggers are pretty common, especially for industrial computing applications. This is why RAM can easily experience around five single-bit errors within an hour of use on an 8 GB memory.
Hard Single-Bit Errors (caused by physical factors):
- Voltage Stress
- Extreme Temperature
- Shock and Vibration Impact
- Manufacturing Defect
Soft Single-Bit Errors (factors harder to detect):
- Improper Read/Write Process
- Electromagnetic Interference (EMI)
- Electrical Interference
- Magnetic Interference
- Alpha particles
- Cosmic Rays
How Error Correcting Code (ECC) Fix Bit Flips?
Error-correcting code memory can detect corrupted data and restore the data using an error-correcting code (ECC), fixing the errors in real-time. ECC creates an encrypted piece of code on the data using an advanced form of parity where it adds a 7-bit parity code for every 64 bits of data by using non-binary, cyclic error-correcting code. Basic parity often only uses a single parity bit for every 8 bits of data, compared to the 7-bit of parity code from ECC. Adding 7 bits for every 64 bits of the binary string allows the ECC RAM not only to detect but also recover correct data.
The advance parity executes ECC such as a single-error code and double-error detection (SECDED) Hamming code, one of the most common error-correcting codes. A latter and faster ECC uses triple modular redundancy (TMR) that is faster than the Hamming error correction system—having additional data from the encryptions require ECC RAM to include an additional chip on the RAM card to store and calculate all of these encrypted codes from memory. This is how ECC memory has 9 memory chips compared to non-ECC memory, having only 8 memory chips.
The decrypting and encrypting process creates a reliable calculation from the ECC RAM but also results in a slightly slower speed compared to non-ECC RAM, with around 1% -2% in reduction speed, not a significant loss compared to the upsides that ECC RAM offers.
ECC memory vs. non-ECC memory
|Factors||ECC Memory||Non-ECC Memory||Winner|
|Number of Chips||9 Memory Chips (one for ECC)||8 Volatile Memory Chips|
|Reliability||Ultra-Reliable (0.09% Failure Rate)||Normal (0.6% Failure Rate)||
|Durability||Highly Durable for 24/7 usage||Less Durable for constant usage||
|Protection Features||Can detect and recover data errors||Can only detect data errors||
|Speed||Slower Speed (1%-2% Slower for Registered ECC RAM)||Faster Speed (don’t require constant encrypting)|
|Price||10-20% Higher Price (due to additional ECC chip and lower supply)||Lower Price (more mainstream and affordable)||
|Power Consumption||It might use slightly more power for the additional ECC chip||Use less energy compared to ECC RAM with only eight chips||
|Compatability||It only works for ECC-enabled CPUs, Motherboards, and Chipsets||Works for a wider range of CPUs, Motherboards, and Chipsets||
What CPUs, Motherboards, or Chipsets support ECC Memory?
To support ECC memory, the CPU, motherboard, and chipset must be compatible with ECC RAM. Not supported models wouldn’t work with ECC RAM or just running the RAM without the ECC feature. Consumer-grade motherboards and chipsets often do not support ECC RAM, whereases server-grade motherboards and chipsets do support ECC RAM. Moreover, CPUs that support ECC RAM are more high-end server CPUs, such as the Intel Xeon Server Processors or AMD Threadripper CPUs for ECC memory support. These are standard specs for enterprise-level server applications due to the prioritization of high-performance and reliable computing.
When is ECC Memory Worth It?
The high-cost overall setup cost for ECC memory might seem not worth the expenses for consumer use. Primarily consumers often prefer speed over reliability. However, for enterprise-level applications, ECC memory frameworks are an essential investment where mission-critical applications require the most reliable systems they can get. Moreover, having optimal redundancy might not just save costs and time. It can even save the lives of the stakeholders around the deployment area. Therefore, enterprises utilize ECC memory for their computers to maintain the reliability of various industrial deployments amid extreme environments.
Applications that use ECC memory:
- Servers and Data Centers
- Industrial Automations
- Medical Industry
- Financial Institutions
- Military & Defense
- Space Industry
The Future of ECC Memory and DDR5 RAM – What lies ahead?
Unlike the non-ECC DDR4 RAM, DDR5 SDRAM has ECC embedded in its chip. This allows a denser RAM chip with higher capacity at the same die to perform ECC to avoid potential memory errors. Therefore, DDR5 RAM can detect and correct bit flips before transmitting the data to the CPU. However, it is still different with DDR5 ECC RAM, which also detects and corrects bit errors but on a dedicated ECC chip that is much more powerful and optimized than the consumer-grade DDR5 RAM.
Industrial Server-Grade Computers with ECC Memory Support
Premio's latest industrial computers leverage rich performance enhancements provided by Intel 10th Generation Core and Xeon-W processors with a W480E chipset. The Xeon processors ensure ECC memory support for robust and reliable performance benchmarks amid the most computing-intensive applications for mission-critical data acquisition and telemetry in edge computing deployments.
RCO-6000-CML Modular AI Edge Inference Computer Series
- 10th Gen Intel® Core™ & Intel® Xeon® W processors with W480E chipset
- Access to Error Correction Code (ECC) Memory
- Modular EDGEBoost Nodes for inference and machine learning workloads
- Plug and Play Dual-SIM 5G & 4G/LTE cellular network module
- Workload consolidation at the edge with versatile I/O
- Ruggedized and Tested for rugged edge computing
ACO-6000-CML Fanless In-Vehicle Computer Series
- Intel® 10th Gen Core and Xeon-W processors with W480E chipset
- EN50155 Railway Certification Ready
- Wide Power Supply Input 9-48VDC and 48-110VDC
- Support up to 18x LAN, 16x PoE, and 16x USB
- Built-in CAN Bus Transportation Protocol
- Power Ignition Management
- ECC Memory Support
- 5G Ready