发布于 2015-09-14 15:00:45 | 133 次阅读 | 评论: 0 | 来源: 网络整理
This document provides an overview of indexes in MongoDB, including index types and creation options. For operational guidelines and procedures, see the 索引操作 document. For strategies and practical approaches, see the 索引策略 document.
An index is a data structure that allows you to quickly locate documents based on the values stored in certain specified fields. Fundamentally, indexes in MongoDB are similar to indexes in other database systems. MongoDB supports indexes on any field or sub-field contained in documents within a MongoDB collection.
MongoDB indexes have the following core features:
MongoDB defines indexes on a per-collection level.
You can create indexes on a single field or on multiple fields using a compound index.
Indexes enhance query performance, often dramatically. However, each index also incurs some overhead for every write operation. Consider the queries, the frequency of these queries, the size of your working set, the insert load, and your application’s requirements as you create indexes in your MongoDB environment.
All MongoDB indexes use a B-tree data structure. MongoDB can use these representation of the data to optimize query responses.
Every query, including update operations, use one and only one index. The query optimizer selects the index empirically by occasionally running alternate query plans and by selecting the plan with the best response time for each query type. You can override the query optimizer using the cursor.hint() method.
An index “covers” a query if:
When an index covers a query, the server can both match the query conditions and return the results using only the index; MongoDB does not need to look at the documents, only the index, to fulfill the query. Querying the index can be faster than querying the documents outside of the index.
See Create Indexes that Support Covered Queries for more information.
Using queries with good index coverage reduces the number of full documents that MongoDB needs to store in memory, thus maximizing database performance and throughput.
If an update does not change the size of a document or cause the document to outgrow its allocated area, then MongoDB will update an index only if the indexed fields have changed. This improves performance. Note that if the document has grown and must move, all index keys must then update.
This section enumerates the types of indexes available in MongoDB. For all collections, MongoDB creates the default _id index. You can create additional indexes with the ensureIndex() method on any single field or sequence of fields within any document or sub-document. MongoDB also supports indexes of arrays, called multi-key indexes.
The _id index is a unique index [1] on the _id field, and MongoDB creates this index by default on all collections. [2] You cannot delete the index on _id.
The _id field is the primary key for the collection, and every document must have a unique _id field. You may store any unique value in the _id field. The default value of _id is ObjectID on every insert() <db.collection.insert()` operation. An ObjectId is a 12-byte unique identifiers suitable for use as the value of an _id field.
注解
In sharded clusters, if you do not use the _id field as the shard key, then your application must ensure the uniqueness of the values in the _id field to prevent errors. This is most-often done by using a standard auto-generated ObjectId.
[1] | Although the index on _id is unique, the getIndexes() method will not print unique: true in the mongo shell. |
[2] | Before version 2.2 capped collections did not have an _id field. In 2.2, all capped collections have an _id field, except those in the local database. See the release notes for more information. |
All indexes in MongoDB are secondary indexes. You can create indexes on any field within any document or sub-document. Additionally, you can create compound indexes with multiple fields, so that a single query can match multiple components using the index while scanning fewer whole documents.
In general, you should create indexes that support your primary, common, and user-facing queries. Doing so requires MongoDB to scan the fewest number of documents possible.
In the mongo shell, you can create an index by calling the ensureIndex() method. Arguments to ensureIndex() resemble the following:
{ "field": 1 }
{ "product.quantity": 1 }
{ "product": 1, "quantity": 1 }
For each field in the index specify either 1 for an ascending order or -1 for a descending order, which represents the order of the keys in the index. For indexes with more than one key (i.e. compound indexes) the sequence of fields is important.
You can create indexes on fields that hold sub-documents as in the following example:
Example
Given the following document in the factories collection:
{ "_id": ObjectId(...), metro: { city: "New York", state: "NY" } } )
You can create an index on the metro key. The following queries would then use that index, and both would return the above document:
db.factories.find( { metro: { city: "New York", state: "NY" } } );
db.factories.find( { metro: { $gte : { city: "New York" } } } );
The second query returns the document because { city: "New York" } is less than { city: "New York", state: "NY" } The order of comparison is in ascending key order in the order the keys occur in the BSON document.
You can create indexes on fields in sub-documents, just as you can index top-level fields in documents. [3] These indexes allow you to use a “dot notation,” to introspect into sub-documents.
Consider a collection named people that holds documents that resemble the following example document:
{"_id": ObjectId(...)
"name": "John Doe"
"address": {
"street": "Main"
"zipcode": 53511
"state": "WI"
}
}
You can create an index on the address.zipcode field, using the following specification:
db.people.ensureIndex( { "address.zipcode": 1 } )
[3] | Indexes on Sub-documents, by contrast allow you to index fields that hold documents, including the full content, up to the maximum Index Size of the sub-document in the index. |
MongoDB supports “compound indexes,” where a single index structure holds references to multiple fields within a collection’s documents. Consider a collection named products that holds documents that resemble the following document:
{
"_id": ObjectId(...)
"item": "Banana"
"category": ["food", "produce", "grocery"]
"location": "4th Street Store"
"stock": 4
"type": cases
"arrival": Date(...)
}
If most applications queries include the item field and a significant number of queries will also check the stock field, you can specify a single compound index to support both of these queries:
db.products.ensureIndex( { "item": 1, "location": 1, "stock": 1 } )
Compound indexes support queries on any prefix of the fields in the index. [4] For example, MongoDB can use the above index to support queries that select the item field and to support queries that select the item field and the location field. The index, however, would not support queries that select the following:
When creating an index, the number associated with a key specifies the direction of the index. The options are 1 (ascending) and -1 (descending). Direction doesn’t matter for single key indexes or for random access retrieval but is important if you are doing sort queries on compound indexes.
The order of fields in a compound index is very important. In the previous example, the index will contain references to documents sorted first by the values of the item field and, within each value of the item field, sorted by the values of location, and then sorted by values of the stock field.
[4] | Index prefixes are the beginning subset of fields. For example, given the index { a: 1, b: 1, c: 1 } both { a: 1 } and { a: 1, b: 1 } are prefixes of the index. |
Indexes store references to fields in either ascending or descending order. For single-field indexes, the order of keys doesn’t matter, because MongoDB can traverse the index in either direction. However, for compound indexes, if you need to order results against two fields, sometimes you need the index fields running in opposite order relative to each other.
To specify an index with a descending order, use the following form:
db.products.ensureIndex( { "field": -1 } )
More typically in the context of a compound index, the specification would resemble the following prototype:
db.products.ensureIndex( { "fieldA": 1, "fieldB": -1 } )
Consider a collection of event data that includes both usernames and a timestamp. If you want to return a list of events sorted by username and then with the most recent events first. To create this index, use the following command:
db.events.ensureIndex( { "username" : 1, "timestamp" : -1 } )
If you index a field that contains an array, MongoDB indexes each value in the array separately, in a “multikey index.”
Example
Given the following document:
{ "_id" : ObjectId("..."),
"name" : "Warm Weather",
"author" : "Steve",
"tags" : [ "weather", "hot", "record", "april" ] }
Then an index on the tags field would be a multikey index and would include these separate entries:
{ tags: "weather" }
{ tags: "hot" }
{ tags: "record" }
{ tags: "april" }
Queries could use the multikey index to return queries for any of the above values.
You can use multikey indexes to index fields within objects embedded in arrays, as in the following example:
Example
Consider a feedback collection with documents in the following form:
{
"_id": ObjectId(...)
"title": "Grocery Quality"
"comments": [
{ author_id: ObjectId(...)
date: Date(...)
text: "Please expand the cheddar selection." },
{ author_id: ObjectId(...)
date: Date(...)
text: "Please expand the mustard selection." },
{ author_id: ObjectId(...)
date: Date(...)
text: "Please expand the olive selection." }
]
}
An index on the comments.text field would be a multikey index and would add items to the index for all of the sub-documents in the array.
With an index, such as { comments.text: 1 } you, consider the following query:
db.feedback.find( { "comments.text": "Please expand the olive selection." } )
This would select the document, that contains the following document in the comments.text array:
{ author_id: ObjectId(...)
date: Date(...)
text: "Please expand the olive selection." }
Compound Multikey Indexes May Only Include One Array Field
While you can create multikey compound indexes, at most one field in a compound index may hold an array. For example, given an index on { a: 1, b: 1 }, the following documents are permissible:
{a: [1, 2], b: 1}
{a: 1, b: [1, 2]}
However, the following document is impermissible, and MongoDB cannot insert such a document into a collection with the {a: 1, b: 1 } index:
{a: [1, 2], b: [1, 2]}
If you attempt to insert a such a document, MongoDB will reject the insertion, and produce an error that says cannot index parallel arrays. MongoDB does not index parallel arrays because they require the index to include each value in the Cartesian product of the compound keys, which could quickly result in incredibly large and difficult to maintain indexes.
A unique index causes MongoDB to reject all documents that contain a duplicate value for the indexed field. To create a unique index on the user_id field of the members collection, use the following operation in the mongo shell:
db.addresses.ensureIndex( { "user_id": 1 }, { unique: true } )
By default, unique is false on MongoDB indexes.
If you use the unique constraint on a compound index then MongoDB will enforce uniqueness on the combination of values, rather than the individual value for any or all values of the key.
If a document does not have a value for the indexed field in a unique index, the index will store a null value for this document. MongoDB will only permit one document without a unique value in the collection because of this unique constraint. You can combine with the sparse index to filter these null values from the unique index.
Sparse indexes only contain entries for documents that have the indexed field. [5] Any document that is missing the field is not indexed. The index is “sparse” because of the missing documents when values are missing.
By contrast, non-sparse indexes contain all documents in a collection, and store null values for documents that do not contain the indexed field. Create a sparse index on the xmpp_id field, of the members collection, using the following operation in the mongo shell:
db.addresses.ensureIndex( { "xmpp_id": 1 }, { sparse: true } )
By default, sparse is false on MongoDB indexes.
警告
Using these indexes will sometimes result in incomplete results when filtering or sorting results, because sparse indexes are not complete for all documents in a collection.
注解
Do not confuse sparse indexes in MongoDB with block-level indexes in other databases. Think of them as dense indexes with a specific filter.
You can combine the sparse index option with the unique indexes option so that mongod will reject documents that have duplicate values for a field, but that ignore documents that do not have the key.
[5] | All documents that have the indexed field are indexed in a sparse index, even if that field stores a null value in some documents. |
You specify index creation options in the second argument in ensureIndex().
The options sparse, unique, and TTL affect the kind of index that MongoDB creates. This section addresses, background construction and duplicate dropping, which affect how MongoDB builds the indexes.
By default, creating an index is a blocking operation. Building an index on a large collection of data can take a long time to complete. To resolve this issue, the background option can allow you to continue to use your mongod instance during the index build.
For example, to create an index in the background of the zipcode field of the people collection you would issue the following:
db.people.ensureIndex( { zipcode: 1}, {background: true} )
By default, background is false for building MongoDB indexes.
You can combine the background option with other options, as in the following:
db.people.ensureIndex( { zipcode: 1}, {background: true, sparse: true } )
Be aware of the following behaviors with background index construction:
A mongod instance can only build one background index per database, at a time.
在 2.2 版更改: Before 2.2, a single mongod instance could only build one index at a time.
The indexing operation runs in the background so that other database operations can run while creating the index. However, the mongo shell session or connection where you are creating the index will block until the index build is complete. Open another connection or mongo instance to continue using commands to the database.
The background index operation use an incremental approach that is slower than the normal “foreground” index builds. If the index is larger than the available RAM, then the incremental process can take much longer than the foreground build.
If your application includes ensureIndex() operations, and an index doesn’t exist for other operational concerns, building the index can have a severe impact on the performance of the database.
Make sure that your application checks for the indexes at start up using the getIndexes() method or the equivalent method for your driver and terminates if the proper indexes do not exist. Always build indexes in production instances using separate application code, during designated maintenance windows.
Building Indexes on Secondaries
Background index operations on a replica set primary become foreground indexing operations on secondary members of the set. All indexing operations on secondaries block replication.
To build large indexes on secondaries the best approach is to restart one secondary at a time in standalone mode and build the index. After building the index, restart as a member of the replica set, allow it to catch up with the other members of the set, and then build the index on the next secondary. When all the secondaries have the new index, step down the primary, restart it as a standalone, and build the index on the former primary.
Remember, the amount of time required to build the index on a secondary node must be within the window of the oplog, so that the secondary can catch up with the primary.
See Build Indexes on Replica Sets for more information on this process.
Indexes on secondary members in “recovering” mode are always built in the foreground to allow them to catch up as soon as possible.
See Build Indexes on Replica Sets for a complete procedure for rebuilding indexes on secondaries.
注解
If MongoDB is building an index in the background, you cannot perform other administrative operations involving that collection, including repairDatabase, drop that collection (i.e. db.collection.drop(),) and compact. These operations will return an error during background index builds.
Queries will not use these indexes until the index build is complete.
MongoDB cannot create a unique index on a field that has duplicate values. To force the creation of a unique index, you can specify the dropDups option, which will only index the first occurrence of a value for the key, and delete all subsequent values.
警告
As in all unique indexes, if a document does not have the indexed field, MongoDB will include it in the index with a “null” value.
If subsequent fields do not have the indexed field, and you have set {dropDups: true}, MongoDB will remove these documents from the collection when creating the index. If you combine dropDups with the sparse option, this index will only include documents in the index that have the value, and the documents without the field will remain in the database.
To create a unique index that drops duplicates on the username field of the accounts collection, use a command in the following form:
db.accounts.ensureIndex( { username: 1 }, { unique: true, dropDups: true } )
警告
Specifying { dropDups: true } will delete data from your database. Use with extreme caution.
By default, dropDups is false.
TTL indexes are special indexes that MongoDB can use to automatically remove documents from a collection after a certain amount of time. This is ideal for some types of information like machine generated event data, logs, and session information that only need to persist in a database for a limited amount of time.
These indexes have the following limitations:
注解
TTL indexes expire data by removing documents in a background task that runs once a minute. As a result, the TTL index provides no guarantees that expired documents will not exist in the collection. Consider that:
In all other respects, TTL indexes are normal indexes, and if appropriate, MongoDB can use these indexes to fulfill arbitrary queries.
See
MongoDB provides “geospatial indexes” to support location-based and other similar queries in a two dimensional coordinate systems. For example, use geospatial indexes when you need to take a collection of documents that have coordinates, and return a number of options that are “near” a given coordinate pair.
To create a geospatial index, your documents must have a coordinate pair. For maximum compatibility, these coordinate pairs should be in the form of a two element array, such as [ x , y ]. Given the field of loc, that held a coordinate pair, in the collection places, you would create a geospatial index as follows:
db.places.ensureIndex( { loc : "2d" } )
MongoDB will reject documents that have values in the loc field beyond the minimum and maximum values.
注解
MongoDB permits only one geospatial index per collection. Although, MongoDB will allow clients to create multiple geospatial indexes, a single query can use only one index.
See the $near, and the database command geoNear for more information on accessing geospatial data.
In addition to conventional geospatial indexes, MongoDB also provides a bucket-based geospatial index, called “geospatial haystack indexes.” These indexes support high performance queries for locations within a small area, when the query must filter along another dimension.
Example
If you need to return all documents that have coordinates within 25 miles of a given point and have a type field value of “museum,” a haystack index would be provide the best support for these queries.
Haystack indexes allow you to tune your bucket size to the distribution of your data, so that in general you search only very small regions of 2d space for a particular kind of document. These indexes are not suited for finding the closest documents to a particular location, when the closest documents are far away compared to bucket size.
Be aware of the following behaviors and limitations:
A collection may have no more than 64 indexes.
Index keys can be no larger than 1024 bytes.
Documents with fields that have values greater than this size cannot be indexed.
To query for documents that were too large to index, you can use a command similar to the following:
db.myCollection.find({<key>: <value too large to index>}).hint({$natural: 1})
The name of an index, including the namespace must be shorter than 128 characters.
Indexes have storage requirements, and impacts insert/update speed to some degree.
Create indexes to support queries and other operations, but do not maintain indexes that your MongoDB instance cannot or will not use.
For queries with the $or operator, each clause of an $or query executes in parallel, and can each use a different index.
For queries that use the sort() method and use the $or operator, the query cannot use the indexes on the $or fields.
2d geospatial queries do not support queries that use the $or operator.