Working set

The working set for a MongoDB database is the portion of your data that clients access most often. You can estimate size of the working set, using the workingSet document in the output of serverStatus.

To return serverStatus with the workingSet document, issue a command in the following form:

db.runCommand( { serverStatus: 1, workingSet: 1 } )
db.serverStatus( { workingSet: 1 } )

Your working set should stay in memory to achieve good performance. Otherwise many random disk IO’s will occur, and unless you are using SSD, this can be quite slow.

Must my working set size fit RAM?

Your working set should stay in memory to achieve good performance. Otherwise many random disk IO’s will occur, and unless you are using SSD, this can be quite slow.

One area to watch specifically in managing the size of your working set is index access patterns. If you are inserting into indexes at random locations (as would happen with id’s that are randomly generated by hashes), you will continually be updating the whole index. If instead you are able to create your id’s in approximately ascending order (for example, day concatenated with a random id), all the updates will occur at the right side of the b-tree and the working set size for index pages will be much smaller. It is fine if databases and thus virtual size are much larger than RAM.

How do I calculate how much RAM I need for my application?

The amount of RAM you need depends on several factors, including but not limited to:

  • The relationship between database storage and working set.

  • The operating system’s cache strategy for LRU (Least Recently Used)

  • The impact of journaling

  • The number or rate of page faults and other MMS gauges to detect when you need more RAM

MongoDB defers to the operating system when loading data into memory from disk. It simply memory maps all its data files and relies on the operating system to cache data. The OS typically evicts the least-recently-used data from RAM when it runs low on memory. For example if clients access indexes more frequently than documents, then indexes will more likely stay in RAM, but it depends on your particular usage.

To calculate how much RAM you need, you must calculate your working set size, or the portion of your data that clients use most often. This depends on your access patterns, what indexes you have, and the size of your documents.

If page faults are infrequent, your working set fits in RAM. If fault rates rise higher than that, you risk performance degradation. This is less critical with SSD drives than with spinning disks.

How do I read memory statistics in the UNIX top command

Because mongod uses memory-mapped files, the memory statistics in top require interpretation in a special way. On a large database, VSIZE (virtual bytes) tends to be the size of the entire database. If the mongod doesn’t have other processes running, RSIZE (resident bytes) is the total memory of the machine, as this counts file system cache contents.

For Linux systems, use the vmstat command to help determine how the system uses memory. On OS X systems use vm_stat.

Ensure Indexes Fit RAM

For the fastest processing, ensure that your indexes fit entirely in RAM so that the system can avoid reading the index from disk.

To check the size of your indexes, use the db.collection.totalIndexSize() helper, which returns data in bytes:

> db.collection.totalIndexSize()

The above example shows an index size of almost 4.3 gigabytes. To ensure this index fits in RAM, you must not only have more than that much RAM available but also must have RAM available for the rest of the working set. Also remember:

If you have and use multiple collections, you must consider the size of all indexes on all collections. The indexes and the working set must be able to fit in memory at the same time.

Indexes that Hold Only Recent Values in RAM

Indexes do not have to fit entirely into RAM in all cases. If the value of the indexed field increments with every insert, and most queries select recently added documents; then MongoDB only needs to keep the parts of the index that hold the most recent or “right-most” values in RAM. This allows for efficient index use for read and write operations and minimize the amount of RAM required to support the index.