smali

пятница, 21 сентября 2018 г.

MongoDB: Index Usage and MongoDB explain() (Part 1)

https://dzone.com/articles/mongodb-index-usage-and-mongodb-explain-part-1

In this two-part series, I'm going to explain explain. No, the repetition is not a pun or a typo. I'm talking about explain(), the most useful MongoDB feature to investigate a query. Here, I'll introduce the index types available in MongoDB, their properties, and how to create and use them. In the next article, we'll see explain() in action on some real examples.
Using explain(), questions like these will find a proper answer:
  • Is the query using an index?
  • How is it using the index?
  • How many documents are scanned?
  • How many documents are returned?
  • For how many milliseconds does the query run?
And many others. The explain() command provides a lot of information.
If you are familiar with MySQL, then you probably know about the command EXPLAIN. In MongoDB, we have a similar feature, and the goal of using it is exactly the same: find out how we can improve a query in order to reduce the execution time as far as possible. Based on the information provided, we can decide to create a new index or to rewrite the query to take advantage of indexes that already exist. The same as you do when using MySQL.


Indexes Supported by MongoDB

The concept of an index in MongoDB is the same as in relational databases. An index is generally a small structure — in comparison to the collection size — that provides a better way to access documents more quickly. Without an index, the only way that MongoDB has to retrieve the documents is to do a collection scan: reading sequentially all the documents in the collection. This is exactly the same as the full scan table in MySQL. Another similarity to MySQL is that the indexes in MongoDB have a structure based on the well known B-Tree. To understand explain() you'll need to understand the index structures.
Let's have an overview of the index types available in MongoDB and their features and properties. We focus, in particular, on the single, compound, and multikey index types. These are by far the most used and most useful in the majority of cases. We'll also present the other types, but when using the explain(), in the second part of this article series, we'll use only single field and compound indexes in the examples.

Single Field Indexes

This is an index built on a single field of the documents. The entries could be a single value, such as a string or a number, but also could be an embedded document.
By default, each collection has a single field index automatically created on the _id field, the primary key. The index can be defined in ascending or descending order.
Let's see some examples.
Assume we have a collection people containing the following type of document:
{
  "_id": 1,
  "person": { name: "John", surname: "Brown" },
  "age": 34,
  "city": "New York"
}
We can define, for example, a single field index on the age field.
db.people.createIndex( { age : 1} )
In this case, we have created an ascending index. If we had wanted to create a descending index, we would have used this syntax:
db.people.createIndex( {age : -1} )
With this kind of index, we can improve all the queries that find documents with a condition and the agefield, like the following:
db.people.find( { age : 20 } ) 
db.people.find( { name : "Antony", age : 30 } )
db.people.find( { age : { $gt : 25} } )
It's interesting to highlight that the index can be used to improve even the sorting of the results. In this case, it doesn't matter whether you have defined the index as ascending or descending. MongoDB can traverse the items in the index in both directions.
db.people.find().sort( { age: 1} )
db.people.find().sort( {age : -1} )
Both these queries use the index to retrieve all of the documents in the specified order.
The next query can use the index to retrieve the documents in the order specified by the sort.
db.people.find( { age : { $lt : 25 } } ).sort( { age : -1} )

Indexes on Embedded Documents

We can even define an index on an embedded document.
db.people.createIndex( { person: 1 } )
In this case, each item in the index is the embedded document as a whole. The index can be used when the condition in the query matches exactly the embedded document. This query can retrieve the document using the index:
db.people.find( { person : {name : "John", surname: "Brown" } } )
But the next two examples are not able to use the index, and MongoDB will do a collection scan:
db.people.find( { person : {surname : "Brown", name: "John" } } )
db.prople.find( person.name: "John", person.surname: "Brown" )
It is more useful to create indexes on embedded fields rather than on embedded documents. For doing this, we can use the dot notation.
db.people.createIndex( { "person.name": 1} )
db.people.createIndex( { "person.surname": 1 } )
We have created two separate ascending indexes on the name and surname embedded fields. Now, queries like the following can rely on the new indexes to be resolved.
db.people.find( { "person.name": "John" } )
db.people.find( { "person.surname": "Brown, "person.name": "John" } )

Compound Indexes

A compound index is an index on multiple fields. Using the same people collection, we can create a compound index combining the city and age field.
db.people.createIndex( {city: 1, age: 1, person.surname: 1  } )
In this case, we have created a compound index where the first entry is the value of city field; the second is the value of the age field, and the third is the person.name. All the fields here are defined in ascending order.
Queries such as the following can benefit from the index:
db.people.find( { city: "Miami", age: { $gt: 50 } } )
db.people.find( { city: "Boston" } )
db.people.find( { city: "Atlanta", age: {$lt: 25}, "person.surname": "Green" } )
The order we define the fields is important. In fact, only the left prefix of the index can be used to retrieve the documents. In the examples above, the index can be used because the fields in the find() are always a left-prefix of the index definition. The following queries will not use the index because the field city is not specified in the query. A collection scan is required to solve them.
db.people.find( { age: 20 } )
db.people.find( { age: { $gt: 40 }, "person.surname": "Brown" } )
db.people.find( { "person.surname": "Green" }javascript:void(0)
The ascending or descending order is more important in the case of compound indexes. In fact, we need to pay attention when defining the order in the index because not all the queries can rely on it, in particular, when using the sort() function.
Assume we have the compound index defined as follows:
db.people.createIndex( {city: 1, age: -1} )
The specified sort direction in the documents, when use sort(), must match the same pattern of the index definition or the inverse pattern. Any other pattern will not be supported by the index.
So, the following queries will use the index for sorting:
db.people.find( ).sort( city: 1, age: -1)
db.people.find( ).sort( city: -1, age: 1)
The following queries will not use the index:
db.people.find( ).sort( { city: 1, age: 1} )
db.people.find( ).sort( { city: -1, age: -1} )
The following queries will use the index both for retrieving the documents and for sorting.
db.people.find( { city: "New York" } ).sort( city: 1, age: -1)
db.people.find( { city: "Miami", age:{ $gt: 50 } } ).sort( city: -1, age: 1)
They use the left prefix and the pattern of the sort() stage matches exactly the index pattern or the inverse pattern.

Multikey Indexes

This is the index type for arrays. When creating an index on an array, MongoDB will create an index entry for every element.
Let's assume we have these documents:
{
   "_id": 1,
   "person": { name: "John", surname: "Brown" },
   "age": 34,
   "city": "New York",
   "hobbies": [ "music", "gardening", "skiing" ]
 }
The multikey index can be created as:
db.people.createIndex( { hobbies: 1} )
Queries such as these next examples will use the index:
db.people.find( { hobbies: "music" } )
db.people.find( { hobbies: "music", hobbies: "gardening" } )

Geospatial and Text Indexes

MongoDB supports specialized indexes also for geospatial data and for text searches. They are out of the scope of the goal of this article, but it is worth mentioning them. We'll look at these in a future post maybe.

Index Options

TTL

The acronym stands for Time To Live. This is a special option that we can apply only to a single field index to permit the automatic deletion of documents after a certain time.
During index creation, we can define an expiration time. After that time, all the documents that are older than the expiration time will be removed from the collection. This kind of feature is very useful when we are dealing with data that don't need to persist in the database. A good example of this is session data.
Let's see how to define the TTL option on the sessionlog collection.
db.sessionlog.createIndex( { "lastUpdateTime": 1 }, { expireAfterSeconds: 1800 } )
In this case, MongoDB will drop the documents from the collection automatically once half an hour (1800 seconds) has passed since the value in lastUpdateTime field.
MongoDB runs a background process every 60 seconds to drop the expired documents. So, this means that the real deletion of a document can be applied with a short delay, not precisely after expireAfterSeconds.
There are some restrictions:
  • as already mentioned, only single field indexes can have the TTL option
  • the _id single field index cannot support the TTL option
  • the indexed field must be a date type
  • a capped collection cannot have a TTL index
In a replica set, documents are deleted only in the primary node; the background process works only on the primary node, with deletions correctly replicated as per any other event.

Partial Indexes

partial index is an index that contains only a subset of the values based on a filter rule. They are useful in cases where:
  • the index size can be reduced
  • we want to index the most relevant and used values in the query conditions
  • we want to index the most selective values of a field
Here's how to create a partial index using the clause partialFilterExpression.
db.people.createIndex(
   { "city": 1, "person.surname": 1 },
   { partialFilterExpression: { age : { $lt: 30 } } }
)
We have created a compound index on city and person.surname but only for the documents with ageless than 30.
In order for the partial index to be used, the queries must contain a condition on the age field.
db.people.find( { city: "New Tork", age: { $eq: 20} } )
The following queries cannot use the partial index because the results would be incomplete. In these examples, MongoDB will use a collection scan:
db.people.find( { city: "Miami" } )
db.people.find( { city: "Orlando", age : { $gt: 25 } } )

Sparse Indexes

Sparse indexes are a subset of partial indexes. A sparse index only contains elements for the documents that have the indexed field even if it is null.
Since MongoDB is a schemaless database, the documents in a collection can have different fields, so an indexed field may not be present in some of them.
To create such an index, use the sparse option:
db.people.createIndex( { city: 1 }, { sparse: true } )
In this case, we are assuming there could be documents in the collection with the field city missing.
Regular indexes — without the sparse option — contain all the documents of the collection with a nullvalue for the elements whose documents don't contain the indexed field.
Sparse indexes are based on the existence of a field in the documents and are useful to reduce the size of the index. Since they are a subset of partial indexes, if you have to decide, then you should choose partial indexes.
Defining a partial index with the following filter is the same as having a sparse index:
db.people.createIndex(
  { city: 1 },
  { partialFilterExpression: { city: { $exists: true }  } }
)

Unique indexes

MongoDB can create an index as unique. An index defined this way cannot contain duplicate entries.
To create such an index, use the unique option.
db.people.createIndex( { city: 1 }, { unique: true } )
Uniqueness can be defined for compound indexes too.
db.people.createIndex( { city: 1, person.surname: 1}, { unique: true } )
By default, the index on _id is automatically created as unique.
You cannot create the index as unique if you already have documents in the collection with duplicates for the proposed index.
As we know, missing fields in the documents are inserted in the index with a null value, even in the case of a unique index creation. Consequently, only one null value is permitted in the index because of the uniqueness restraint. Our suggestion is to try to avoid missing fields when creating a unique index.

A Couple of Useful Commands

Here are a couple of useful commands when dealing with indexes.

List of the Indexes in a Collection

Use the getIndexes() method.
MongoDB > db.restaurants.getIndexes()
[
{
"v" : 2,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "test.restaurants"
},
{
"v" : 2,
"key" : {
"borough" : 1
},
"name" : "borough_1",
"ns" : "test.restaurants"
},
{
"v" : 2,
"key" : {
"cuisine" : 1
},
"name" : "cuisine_1",
"ns" : "test.restaurants"
},
{
"v" : 2,
"key" : {
"cuisine" : 1,
"grades.score" : 1
},
"name" : "cuisine_1_grades.score_1",
"ns" : "test.restaurants"
}
]
In this sample collection, we have 4 indexes, including the _id.

Drop an Existing Index

Use the dropIndex() method passing the name of the index to be dropped.
MongoDB > db.restaurants.dropIndex("cuisine_1_grades.score_1")
{ "nIndexesWas" : 4, "ok" : 1 }

Conclusions

In the first part of this two-part series, I've described the indexes available in MongoDB, and how you can create and use them. We've also discussed some of their main features. In the next part, we'll focus on the explain() method, and using some examples, we'll show how we can investigate queries and, if possible, how we can optimize them.

Комментариев нет:

Отправить комментарий