发布于 2015-09-14 14:53:34 | 157 次阅读 | 评论: 0 | 来源: 网络整理
All operations that create or modify data in the MongoDB instance are write operations. MongoDB represents data as BSON documents stored in collections. Write operations target one collection and are atomic on the level of a single document: no single write operation can atomically affect more than one document or more than one collection.
This document introduces the write operators available in MongoDB as well as presents strategies to increase the efficiency of writes in applications.
For information on write operators and how to write data to a MongoDB database, see the following pages:
For information on specific methods used to perform write operations in the mongo shell, see the following:
For information on how to perform write operations from within an application, see the 驱动程序和客户端 documentation or the documentation for your client library.
The driver write concern change created a new connection class in all of the MongoDB drivers, called MongoClient with a different default write concern. See the release notes for this change, and the release notes for the driver you’re using for more information about your driver’s release.
Clients issue write operations with some level of write concern, which describes the level of concern or guarantee the server will provide in its response to a write operation. Consider the following levels of conceptual write concern:
errors ignored: Write operations are not acknowledged by MongoDB, and may not succeed in the case of connection errors that the client is not yet aware of, or if the mongod produces an exception (e.g. a duplicate key exception for unique indexes.) While this operation is efficient because it does not require the database to respond to every write operation, it also incurs a significant risk with regards to the persistence and durability of the data.
Do not use this option in normal operation.
unacknowledged: MongoDB does not acknowledge the receipt of write operation as with a write concern level of ignore; however, the driver will receive and handle network errors, as possible given system networking configuration.
Before the releases outlined in 默认写入关注变化, this was the default write concern.
journaled: The mongod will confirm the write operation only after it has written the operation to the journal. This confirms that the write operation can survive a mongod shutdown and ensures that the write operation is durable.
While receipt acknowledged without journaled provides the fundamental basis for write concern, there is an up-to 100 millisecond window between journal commits where the write operation is not fully durable. Require journaled as part of the write concern to provide this durability guarantee.
Replica sets present an additional layer of consideration for write concern. Basic write concern levels affect the write operation on only one mongod instance. The w argument to getLastError provides a replica acknowledged level of write concern. With replica acknowledged you can guarantee that the write operation has propagated to the members of a replica set. See the Write Concern for Replica Sets document for more information.
Requiring journaled write concern in a replica set only requires a journal commit of the write operation to the primary of the set regardless of the level of replica acknowledged write concern.
[1] | The default write concern is to call getLastError with no arguments. For replica sets, you can define the default write concern settings in the getLastErrorDefaults If getLastErrorDefaults does not define a default write concern setting, getLastError defaults to basic receipt acknowledgment. |
To provide write concern, drivers issue the getLastError command after a write operation and receive a document with information about the last operation. This document’s err field contains either:
The definition of a “successful write” depends on the arguments specified to getLastError, or in replica sets, the configuration of getLastErrorDefaults. When deciding the level of write concern for your application, become familiar with the Operational Considerations and Write Concern.
The getLastError command has the following options to configure write concern requirements:
j or “journal” option
This option confirms that the mongod instance has written the data to the on-disk journal and ensures data is not lost if the mongod instance shuts down unexpectedly. Set to true to enable, as shown in the following example:
db.runCommand( { getLastError: 1, j: "true" } )
If you set journal to true, and the mongod does not have journaling enabled, as with nojournal, then getLastError will provide basic receipt acknowledgment, and will include a jnote field in its return document.
w option
This option provides the ability to disable write concern entirely as well as specifies the write concern operations for replica sets. See Operational Considerations and Write Concern for an introduction to the fundamental concepts of write concern. By default, the w option is set to 1, which provides basic receipt acknowledgment on a single mongod instance or on the primary in a replica set.
The w option takes the following values:
Disables all acknowledgment of write operations, and suppresses all including network and socket errors.
Disables basic acknowledgment of write operations, but returns information about socket excepts and networking errors to the application.
If you disable basic write operation acknowledgment but require journal commit acknowledgment, the journal commit prevails, and the driver will require that mongod will acknowledge the replica set.
Provides acknowledgment of write operations on a standalone mongod or the primary in a replica set.
A number greater than 1:
Guarantees that write operations have propagated successfully to the specified number of replica set members including the primary. If you set w to a number that is greater than the number of set members that hold data, MongoDB waits for the non-existent members to become available, which means MongoDB blocks indefinitely.
Confirms that write operations have propagated to the majority of configured replica set: nodes must acknowledge the write operation before it succeeds. This ensures that write operation will never be subject to a rollback in the course of normal operation, and furthermore allows you to prevent hard coding assumptions about the size of your replica set into your application.
A tag set:
By specifying a tag set you can have fine-grained control over which replica set members must acknowledge a write operation to satisfy the required level of write concern.
getLastError also supports a wtimeout setting which allows clients to specify a timeout for the write concern: if you don’t specify wtimeout and the mongod cannot fulfill the write concern the getLastError will block, potentially forever.
For more information on write concern and replica sets, see Write Concern for Replica Sets for more information..
In sharded clusters, mongos instances will pass write concern on to the shard mongod instances.
In some situations you may need to insert or ingest a large amount of data into a MongoDB database. These bulk inserts have some special considerations that are different from other write operations.
The insert() method, when passed an array of documents, will perform a bulk insert, and inserts each document atomically. Drivers provide their own interface for this kind of operation.
2.2 新版功能: insert() in the mongo shell gained support for bulk inserts in version 2.2.
Bulk insert can significantly increase performance by amortizing write concern costs. In the drivers, you can configure write concern for batches rather than on a per-document level.
Drivers also have a ContinueOnError option in their insert operation, so that the bulk operation will continue to insert remaining documents in a batch even if an insert fails.
2.0 新版功能: Support for ContinueOnError depends on version 2.0 of the core mongod and mongos components.
If the bulk insert process generates more than one error in a batch job, the client will only receive the most recent error. All bulk operations to a sharded collection run with ContinueOnError, which applications cannot disable. See Strategies for Bulk Inserts in Sharded Clusters section for more information on consideration for bulk inserts in sharded clusters.
For more information see your driver documentation for details on performing bulk inserts in your application. Also consider the following resources: Sharded Clusters, Strategies for Bulk Inserts in Sharded Clusters, and 数据导入和导出.
After every insert, update, or delete operation, MongoDB must update every index associated with the collection in addition to the data itself. Therefore, every index on a collection adds some amount of overhead for the performance of write operations. [2]
In general, the performance gains that indexes provide for read operations are worth the insertion penalty; however, when optimizing write performance, be careful when creating new indexes and always evaluate the indexes on the collection and ensure that your queries are actually using these indexes.
For more information on indexes in MongoDB consider 索引 and 索引策略.
[2] | The overhead for sparse indexes inserts and updates to un-indexed fields is less than for non-sparse indexes. Also for non-sparse indexes, updates that don’t change the record size have less indexing overhead. |
When a single write operation modifies multiple documents, the operation as a whole is not atomic, and other operations may interleave. The modification of a single document, or record, is always atomic, even if the write operation modifies multiple sub-document within the single record.
No other operations are atomic; however, you can attempt to isolate a write operation that affects multiple documents using the isolation operator.
To isolate a sequence of write operations from other read and write operations, see 执行两阶段提交.
Each document in a MongoDB collection has allocated record space which includes the entire document and a small amount of padding. This padding makes it possible for update operations to increase the size of a document slightly without causing the document to outgrow the allocated record size.
Documents in MongoDB can grow up to the full maximum BSON document size. However, when documents outgrow their allocated record size MongoDB must allocate a new record and move the document to the new record. Update operations that do not cause a document to grow, (i.e. in-place updates,) are significantly more efficient than those updates that cause document growth. Use data models that minimize the need for document growth when possible.
For complete examples of update operations, see 更新.
If an update operation does not cause the document to increase in size, MongoDB can apply the update in-place. Some updates change the size of the document, for example using the $push operator to append a sub-document to an array can cause the top level document to grow beyond its allocated space.
When documents grow, MongoDB relocates the document on disk with enough contiguous space to hold the document. These relocations take longer than in-place updates, particularly if the collection has indexes that MongoDB must update all index entries. If collection has many indexes, the move will impact write throughput.
To minimize document movements, MongoDB employs padding. MongoDB adaptively learns if documents in a collection tend to grow, and if they do, adds a paddingFactor so that the documents have room to grow on subsequent writes. The paddingFactor indicates the padding for new inserts and moves.
2.2 新版功能: You can use the collMod command with the usePowerOf2Sizes flag so that MongoDB allocates document space in sizes that are powers of 2. This helps ensure that MongoDB can efficiently reuse the space freed as a result of deletions or document relocations. As with all padding, using document space allocations with power of 2 sizes minimizes, but does not eliminate, document movements.
To check the current paddingFactor on a collection, you can run the db.collection.stats() operation in the mongo shell, as in the following example:
Since MongoDB writes each document at a different point in time, the padding for each document will not be the same. You can calculate the padding size by subtracting 1 from the paddingFactor, for example:
padding size = (paddingFactor - 1) * <document size>.
For example, a paddingFactor of 1.0 specifies no padding whereas a paddingFactor of 1.5 specifies a padding size of 0.5 or 50 percent (50%) of the document size.
Because the paddingFactor is relative to the size of each document, you cannot calculate the exact amount of padding for a collection based on the average document size and padding factor.
If an update operation causes the document to decrease in size, for instance if you perform an $unset or a $pop update, the document remains in place and effectively has more padding. If the document remains this size, the space is not reclaimed until you perform a compact or a repairDatabase operation.
The following operations remove padding:
However, with the compact command, you can run the command with a paddingFactor or a paddingBytes parameter.
Padding is also removed if you use mongoexport from a collection. If you use mongoimport into a new collection, mongoimport will not add padding. If you use mongoimport with an existing collection with padding, mongoimport will not affect the existing padding.
When a database operation removes padding, subsequent update that require changes in record sizes will have reduced throughput until the collection’s padding factor grows. Padding does not affect in-place, and after compact, repairDatabase, and replica set initial sync the collection will require less storage.
In replica sets, all write operations go to the set’s primary, which applies the write operation then records the operations on the primary’s operation log or oplog. The oplog is a reproducible sequence of operations to the data set. Secondary members of the set are continuously replicating the oplog and applying the operations to themselves in an asynchronous process.
Large volumes of write operations, particularly bulk operations, may create situations where the secondary members have difficulty applying the replicating operations from the primary at a sufficient rate: this can cause the secondary’s state to fall behind that of the primary. Secondaries that are significantly behind the primary present problems for normal operation of the replica set, particularly failover in the form of rollbacks as well as general read consistency.
To help avoid this issue, you can customize the write concern to return confirmation of the write operation to another member [3] of the replica set every 100 or 1,000 operations. This provides an opportunity for secondaries to catch up with the primary. Write concern can slow the overall progress of write operations but ensure that the secondaries can maintain a largely current state with respect to the primary.
For more information on replica sets and write operations, see 写关注, Oplog, Oplog Internals, and Changing Oplog Size.
[3] | Calling getLastError intermittently with a w value of 2 or majority will slow the throughput of write traffic; however, this practice will allow the secondaries to remain current with the state of the primary. |
In a sharded cluster, MongoDB directs a given write operation to a shard and then performs the write on a particular chunk on that shard. Shards and chunks are range-based. Shard keys affect how MongoDB distributes documents among shards. Choosing the correct shard key can have a great impact on the performance, capability, and functioning of your database and cluster.
For more information, see 片式群集管理 and Bulk Inserts.