发布于 2015-09-14 14:46:34 | 136 次阅读 | 评论: 0 | 来源: 网络整理
Backups are an important part of any operational disaster recovery plan. A good backup plan must be able to capture data in a consistent and usable state, and operators must be able to automate both the backup and the recovery operations. Also test all components of the backup system to ensure that you can recover backed up data as needed. If you cannot effectively restore your database from the backup, then your backups are useless. This document addresses higher level backup strategies, for more information on specific backup procedures consider the following documents:
As you develop a backup strategy for your MongoDB deployment consider the following factors:
There are two main methodologies for backing up MongoDB instances. Creating binary “dumps” of the database using mongodump or creating filesystem level snapshots. Both methodologies have advantages and disadvantages:
The best option depends on the requirements of your deployment and disaster recovery needs. Typically, filesystem snapshots are because of their accuracy and simplicity; however, mongodump is a viable option used often to generate backups of MongoDB systems.
The following topics provide details and procedures on the two approaches:
In some cases, taking backups is difficult or impossible because of large data volumes, distributed architectures, and data transmission speeds. In these situations, increase the number of members in your replica set or sets.
重要
To capture a point-in-time backup from a sharded cluster you must stop all writes to the cluster. On a running production system, you can only capture an approximation of point-in-time snapshot.
Sharded clusters complicate backup operations, as distributed systems. True point-in-time backups are only possible when stopping all write activity from the application. To create a precise moment-in-time snapshot of a cluster, stop all application write activity to the database, capture a backup, and allow only write operations to the database after the backup is complete.
However, you can capture a backup of a cluster that approximates a point-in-time backup by capturing a backup from a secondary member of the replica sets that provide the shards in the cluster at roughly the same moment. If you decide to use an approximate-point-in-time backup method, ensure that your application can operate using a copy of the data that does not reflect a single moment in time.
The following documents describe sharded cluster related backup procedures:
In most cases, backing up data stored in a replica set is similar to backing up data stored in a single instance. It is possible to lock a single secondary database and then create a backup from that instance. When you unlock the database, the secondary will catch up with the primary. You may also choose to deploy a dedicated hidden member for backup purposes.
If you have a sharded cluster where each shard is itself a replica set, you can use this method to create a backup of the entire cluster without disrupting the operation of the node. In these situations you should still turn off the balancer when you create backups.
For any cluster, using a non-primary node to create backups is particularly advantageous in that the backup operation does not affect the performance of the primary. Replication itself provides some measure of redundancy. Nevertheless, keeping point-in time backups of your cluster to provide for disaster recovery and as an additional layer of protection is crucial.