
Copy-On-Write, usually shortened to COW, is an optimization strategy used in program design. The basic idea is simple: everyone starts by sharing the same underlying data, and only when someone needs to modify it does the system create a copy and apply the change to that new version. It is essentially a delayed, lazy-update approach.
Since JDK 1.5, Java’s concurrent package has included two containers built on this mechanism: CopyOnWriteArrayList and CopyOnWriteArraySet. In the right concurrency scenarios, they can be extremely practical.
What a Copy-On-Write container actually does
A Copy-On-Write container does not modify the current container in place when you add or update elements. Instead, it first copies the current contents into a new container, performs the modification there, and then switches the reference so future readers see the new container.
The benefit is that concurrent reads can happen without locking, because the container being read is not being changed. In that sense, Copy-On-Write is a form of read-write separation: readers and writers are effectively working against different container instances.
How CopyOnWriteArrayList works
Before using CopyOnWriteArrayList, it helps to understand its implementation model. The following code shows how an element is added. Notice that writes must be locked; otherwise, multiple writer threads could each produce their own copied version at the same time.
public boolean add(T e) {
final ReentrantLock lock = this.lock;
lock.lock();
try {
Object[] elements = getArray();
int len = elements.length;
// 复制出新数组
Object[] newElements = Arrays.copyOf(elements, len + 1);
// 把新元素添加到新数组里
newElements[len] = e;
// 把原数组引用指向新数组
setArray(newElements);
return true;
} finally {
lock.unlock();
}
}
final void setArray(Object[] a) {
array = a;
}
Reads, on the other hand, do not need locking. If several threads are adding data while another thread is reading, the reader still sees the old array, because the write operation does not block reads on the old container.
public E get(int index) {
return get(getArray(), index);
}
Java does not provide a CopyOnWriteMap in the JDK, but it is straightforward to build one by following the same idea used by CopyOnWriteArrayList:
import java.util.Collection;
import java.util.Map;
import java.util.Set;
public class CopyOnWriteMap<K, V> implements Map<K, V>, Cloneable {
private volatile Map<K, V> internalMap;
public CopyOnWriteMap() {
internalMap = new HashMap<K, V>();
}
public V put(K key, V value) {
synchronized (this) {
Map<K, V> newMap = new HashMap<K, V>(internalMap);
V val = newMap.put(key, value);
internalMap = newMap;
return val;
}
}
public V get(Object key) {
return internalMap.get(key);
}
public void putAll(Map<? extends K, ? extends V> newData) {
synchronized (this) {
Map<K, V> newMap = new HashMap<K, V>(internalMap);
newMap.putAll(newData);
internalMap = newMap;
}
}
}
Once the mechanism is clear, you can apply it to different kinds of containers as needed.
Where Copy-On-Write is useful
Copy-On-Write concurrent containers are best suited to read-heavy, write-light workloads. Typical examples include whitelists, blacklists, and product category data—situations where reads are frequent and updates are relatively rare.
Consider a search site. Users type keywords into a search box, but certain keywords are not allowed. Those blocked terms can be stored in a blacklist that is refreshed once per night. Every time a user searches, the system checks whether the keyword is in the blacklist and rejects it if necessary.
That kind of access pattern matches Copy-On-Write very well:
package com.ifeve.book;
import java.util.Map;
import com.ifeve.book.forkjoin.CopyOnWriteMap;
/**
* 黑名单服务
*
* @author fangtengfei
*
*/
public class BlackListServiceImpl {
private static CopyOnWriteMap<String, Boolean> blackListMap = new CopyOnWriteMap<String, Boolean>(
1000);
public static boolean isBlackList(String id) {
return blackListMap.get(id) == null ? false : true;
}
public static void addBlackList(String id) {
blackListMap.put(id, Boolean.TRUE);
}
/**
* 批量添加黑名单
*
* @param ids
*/
public static void addBlackList(Map<String,Boolean> ids) {
blackListMap.putAll(ids);
}
}
The implementation is simple, but two practical points matter when using a CopyOnWriteMap:
- Reduce resize overhead. Initialize the container with a capacity that matches expected usage as closely as possible, so that write-time copying does not also trigger unnecessary expansion.
- Prefer batch updates. Every individual write causes a copy, so reducing the number of write operations reduces the number of full-container copies. That is why a bulk method like
addBlackListis valuable.
The trade-offs behind Copy-On-Write
Copy-On-Write containers have clear advantages, but they also come with two important drawbacks: memory pressure and consistency characteristics.
Memory usage
Because writes are handled by copying, both the old container and the new container can exist in memory at the same time during an update. More precisely, the container copy duplicates the references it holds, and any newly written objects are then added into the new container while the old one may still be in use. That means memory consumption can spike noticeably during writes.
If the objects involved are large, the impact can be serious. For example, if existing data occupies around 200 MB and another 100 MB is written during an update, total memory usage may rise to 300 MB. Under that kind of load, frequent Young GC or even Full GC can occur. In one real system, a service that used the Copy-On-Write approach to refresh large objects every night ended up triggering a 15-second Full GC during each nightly update, and application response times increased along with it.
To reduce this cost, one option is to compress the elements stored in the container. If all elements are decimal numbers, for example, they might be encoded in base 36 or base 64 to reduce the memory footprint. Another option is simply not to use a Copy-On-Write container at all, and instead choose a different concurrent container such as ConcurrentHashMap.
Data consistency
Copy-On-Write containers provide eventual consistency, not immediate consistency. A write becomes visible to future readers after the new container reference is published, but readers already looking at the old snapshot will continue to see old data.
So if your requirement is that newly written data must be visible immediately to all reads, Copy-On-Write is not the right fit.
A similar Copy-On-Write approach once existed in C++ STL string implementations, but it was later removed because of various thread-safety concerns.