FAQ: 存储

发布于 2015-09-14 14:59:35 | 168 次阅读 | 评论: 0 | 来源: 网络整理

This document addresses common questions regarding MongoDB’s storage system.

If you don’t find the answer you’re looking for, check the complete list of FAQs or post your question to the MongoDB User Mailing List.

Frequently Asked Questions:

What are memory mapped files?
How do memory mapped files work?
How does MongoDB work with memory mapped files?
What are page faults?
What is the difference between soft and hard page faults?
What tools can I use to investigate storage use in MongoDB?
What is the working set?
Why are the files in my data directory larger than the data in my database?
How can I check the size of a collection?
How can I check the size of indexes?
How do I know when the server runs out of disk space?

What are memory mapped files?¶

A memory-mapped file is a file with data that the operating system places in memory by way of the mmap() system call. mmap() thus maps the file to a region of virtual memory. Memory-mapped files are the critical piece of the storage engine in MongoDB. By using memory mapped files MongoDB can treat the content of its data files as if they were in memory. This provides MongoDB with an extremely fast and simple method for accessing and manipulating data.

How do memory mapped files work?¶

Memory mapping assigns files to a block of virtual memory with a direct byte-for-byte correlation. Once mapped, the relationship between file and memory allows MongoDB to interact with the data in the file as if it were memory.

How does MongoDB work with memory mapped files?¶

MongoDB uses memory mapped files for managing and interacting with all data. MongoDB memory maps data files to memory as it accesses documents. Data that isn’t accessed is not mapped to memory.

What are page faults?¶

Page faults will occur if you’re attempting to access part of a memory-mapped file that isn’t in memory.

If there is free memory, then the operating system can find the page on disk and load it to memory directly. However, if there is no free memory, the operating system must:

find a page in memory that is stale or no longer needed, and write the page to disk.
read the requested page from disk and load it into memory.

This process, particularly on an active system can take a long time, particularly in comparison to reading a page that is already in memory.

What is the difference between soft and hard page faults?¶

Page faults occur when MongoDB needs access to data that isn’t currently in active memory. A “hard” page fault refers to situations when MongoDB must access a disk to access the data. A “soft” page fault, by contrast, merely moves memory pages from one list to another, such as from an operating system file cache. In production, MongoDB will rarely encounter soft page faults.

What tools can I use to investigate storage use in MongoDB?¶

The db.stats() method in the mongo shell, returns the current state of the “active” database. The 数据库统计参考 document outlines the meaning of the fields in the db.stats() output.

What is the working set?¶

Working set represents the total body of data that the application uses in the course of normal operation. Often this is a subset of the total data size, but the specific size of the working set depends on actual moment-to-moment use of the database.

If you run a query that requires MongoDB to scan every document in a collection, the working set will expand to include every document. Depending on physical memory size, this may cause documents in the working set to “page out,” or removed from physical memory by the operating system. The next time MongoDB needs to access these documents, MongoDB may incur a hard page fault.

If you run a query that requires MongoDB to scan every document in a collection, the working set includes every active document in memory.

For best performance, the majority of your active set should fit in RAM.

Why are the files in my data directory larger than the data in my database?¶

The data files in your data directory, which is the /data/db directory in default configurations, might be larger than the data set inserted into the database. Consider the following possible causes:

Preallocated data files.

In the data directory, MongoDB preallocates data files to a particular size, in part to prevent file system fragmentation. MongoDB names first data file <databasename>.0, the next <databasename>.1, etc. The first file mongod allocates is 64 megabytes, the next 128 megabytes, and so on, up to 2 gigabytes, at which point all subsequent files are 2 gigabytes. The data files include files with allocated space but that hold no data. mongod may allocate a 1 gigabyte data file that may be 90% empty. For most larger databases, unused allocated space is small compared to the database.

On Unix-like systems, mongod preallocates an additional data file and initializes the disk space to 0. Preallocating data files in the background prevents significant delays when a new database file is next allocated.

You can disable preallocation with noprealloc run time option. However noprealloc is not intended for use in production environments: only use noprealloc for testing and with small data sets where you frequently drop databases.

On Linux systems you can use hdparm to get an idea of how costly allocation might be:
```
time hdparm --fallocate $((1024*1024)) testfile
```
The oplog.

If this mongod is a member of a replica set, the data directory includes the oplog.rs file, which is a preallocated capped collection in the local database. The default allocation is approximately 5% of disk space on a 64-bit installations, see Oplog Sizing for more information. In most cases, you should not need to resize the oplog. However, if you do, see 更改Oplog大小.
The journal.

The data directory contains the journal files, which store write operations on disk prior to MongoDB applying them to databases. See 日志.
Empty records.

MongoDB maintains lists of empty records in data files when deleting documents and collections. MongoDB can reuse this space, but will never return this space to the operating system.

To reclaim deleted space, use either of the following:
- compact, which defragments deleted space. compact requires up to 2 gigabytes of extra disk space to run. Do not use compact if you are critically low on disk space.
- repairDatabase, which rebuilds the database. Both options require additional disk space to run. For details, see 意外关机后恢复的MongoDB的数据.

How can I check the size of a collection?¶

To view the size of a collection and other information, use the stats() method from the mongo shell. The following example issues stats() for the orders collection:

db.orders.stats();

To view specific measures of size, use these methods:

db.collection.dataSize(): data size for the collection.
db.collection.storageSize(): allocation size, including unused space.
db.collection.totalSize(): the data size plus the index size.
db.collection.totalIndexSize(): the index size.

Also, the following scripts print the statistics for each database and collection:

db._adminCommand("listDatabases").databases.forEach(function (d) {mdb = db.getSiblingDB(d.name); printjson(mdb.stats())})

db._adminCommand("listDatabases").databases.forEach(function (d) {mdb = db.getSiblingDB(d.name); mdb.getCollectionNames().forEach(function(c) {s = mdb[c].stats(); printjson(s)})})

How can I check the size of indexes?¶

To view the size of the data allocated for an index, use one of the following procedures in the mongo shell:

Use the stats() method using the index namespace. To retrieve a list of namespaces, issue the following command:
```
db.system.namespaces.find()
```
Check the value of indexSizes value in the output of db.collection.stats() command.

Example

Issue the following command to retrieve index namespaces:

db.system.namespaces.find()

The command returns a list similar to the following:

{"name" : "test.orders"}
{"name" : "test.system.indexes"}
{"name" : "test.orders.$_id_"}

View the size of the data allocated for the orders.$_id_ index with the following sequence of operations:

use test
db.orders.$_id_.stats().indexSizes

How do I know when the server runs out of disk space?¶

If your server runs out of disk space for data files, you will see something like this in the log:

Thu Aug 11 13:06:09 [FileAllocator] allocating new data file dbms/test.13, filling with zeroes...
Thu Aug 11 13:06:09 [FileAllocator] error failed to allocate new file: dbms/test.13 size: 2146435072 errno:28 No space left on device
Thu Aug 11 13:06:09 [FileAllocator]     will try again in 10 seconds
Thu Aug 11 13:06:19 [FileAllocator] allocating new data file dbms/test.13, filling with zeroes...
Thu Aug 11 13:06:19 [FileAllocator] error failed to allocate new file: dbms/test.13 size: 2146435072 errno:28 No space left on device
Thu Aug 11 13:06:19 [FileAllocator]     will try again in 10 seconds

The server remains in this state forever, blocking all writes including deletes. However, reads still work. To delete some data and compact, using the compact command, you must restart the server first.

If your server runs out of disk space for journal files, the server process will exit. By default, mongod creates journal files in a sub-directory of dbpath named journal. You may elect to put the journal files on another storage device using a filesystem mount or a symlink.

注解

If you place the journal files on a separate storage device you will not be able to use a file system snapshot tool to capture a consistent snapshot of your data files and journal files.