https://dzone.com/articles/mongodb-investigate-queries-with-explain-and-index?edition=399207&utm_source=Zone%20Newsletter&utm_medium=email&utm_campaign=database%202018-09-21
This is the second part of a two-part series. In MongoDB: Index Usage and MongoDB explain(), we introduced the main index types supported by MongoDB and how to create and use them. In this second article, we are going to see some examples of how to use the explain() method to investigate queries. Do you need to optimize MongoDB queries? You'll see how to use explain() to find out how your query uses indexes. Or, perhaps, that it doesn't use indexes!
What Is explain()?
explain() is a method that you can apply to simple queries or to cursors to investigate the query execution plan. The execution plan is how MongoDB resolves a query. Looking at all the information returned by explain(), we can see find out stuff like:
- how many documents were scanned
- how many documents were returned
- which index was used
- how long the query took to be executed
- which alternative execution plans were evaluated
...and other useful information.
The aim of using explain() is to find out how to improve the query, for example, by creating missing indexes or by rewriting it in order to use existing indexes more correctly. If you are familiar with the EXPLAIN command in MySQL, the goals of MongoDB's explain() method are exactly the same.
Explainable Object
You can apply the explain() method to a query or a cursor in the following way, as per any other method:
However, the preferred way to investigate queries in the mongo shell is to create first the explainable object.
We can create an explainable object like this:
Once you have created the explainable object, then any kind of operation can be run against it to investigate a query or cursor execution plan. For example:
Restaurants Test Database
To see some examples, we need a collection with some data.
For our purposes, we can use the New York restaurants database. You can download this from the following URL: https://www.w3resource.com/mongodb-exercises/retaurants.zip
For our purposes, we can use the New York restaurants database. You can download this from the following URL: https://www.w3resource.com/mongodb-exercises/retaurants.zip
Unzip the archive and import the JSON file into MongoDB:
This collection has 3772 documents: all the restaurants in New York City. Here is a document sample.
Explain() Verbosity
The explain() method has three verbosity modes.
- queryPlanner — this is the default mode. At this level, explain provides information about the winning plan, including the index used or if a collection scan is needed (COLLSCAN)
- executionStats — this mode includes all the information provided by the queryPlanner plus the statistics. Statistics include details such as the number of documents examined and returned, the execution time in milliseconds, and so on.
- allPlansExecution — this mode includes all the information provided by the executionStats plus information about the discarded execution plans
We'll see the explain() output in the following examples.
Example 1
It's time to use the restaurant collection to run our first example: find out all the restaurants in the Manhattan borough.
Let's create first the explainable object with the executionStats mode.
Then let's investigate the query.
Here, we can see the output of the explain(). First of all, we can clearly distinguish the "queryPlanner" and the "executionStats" modes. We won't describe each one of the values, as some are really intuitive. Let's have a look just at some of them:
queryPlanner.winningPlan.stage = "COLLSCAN"
This provides very important information about the winning plan: it means that MongoDB needs to do a collection scan. The query is not optimized because all the documents must be read.
This provides very important information about the winning plan: it means that MongoDB needs to do a collection scan. The query is not optimized because all the documents must be read.
queryPlanner.winningPlan.rejectedPlans = []
It's empty. There aren't rejected plans. When the query needs to be executed with COLLSCAN, the only execution plan is the winning plan. We don't have any indexes in the collection apart from the one on_id, so there aren't other execution plans.
It's empty. There aren't rejected plans. When the query needs to be executed with COLLSCAN, the only execution plan is the winning plan. We don't have any indexes in the collection apart from the one on_id, so there aren't other execution plans.
executionStats.nReturned = 1883
The number of documents returned is 1883, the number of restaurants located in Manhattan.
The number of documents returned is 1883, the number of restaurants located in Manhattan.
executionStats.totalDocsExamined = 3772
The number of documents examined is exactly the number of documents in the collection. This was expected because the query uses COLLSCAN
The number of documents examined is exactly the number of documents in the collection. This was expected because the query uses COLLSCAN
executionStats.executionTimeMillis = 1
The execution time of the query. It's just 1 millisecond. This might seem good, but remember that this is the time needed to scan just 3772 documents, a very small test collection. Think about what this time could be in the case of a collection with millions of documents!
The execution time of the query. It's just 1 millisecond. This might seem good, but remember that this is the time needed to scan just 3772 documents, a very small test collection. Think about what this time could be in the case of a collection with millions of documents!
How can we improve the query?
In this case, it's simple. Let's try to create a single field index on borough, the only condition we have in the find(). Then let's try to explain the same query again.
Interpreting the Output
Now, the output is completely different. Let's have a look at some of the most relevant values:
queryPlanner.winningPlan.inputStage.stage = "IXSCAN"
This is very important. IXSCAN means that now MongoDB doesn't need to do a collection scan but an index can be used to find the documents.
This is very important. IXSCAN means that now MongoDB doesn't need to do a collection scan but an index can be used to find the documents.
queryPlanner.winningPlan.inputStage.indexName = "borough_1"
The name of the index used. This is the default name of an index: the name of the field plus the_1 for ascending or _-1 for descending order.
The name of the index used. This is the default name of an index: the name of the field plus the_1 for ascending or _-1 for descending order.
queryPlanner.winningPlan.inputStage.direction = "forward"
MongoDB traverses the index in a forward direction.
MongoDB traverses the index in a forward direction.
executionStats.nRertuned = 1883
The number of documents returned. Obviously, this is the same as before.
The number of documents returned. Obviously, this is the same as before.
executionStats.totalKeysExamined = 1883
The number of keys examined in the index.
The number of keys examined in the index.
executionStats.totalDocsExamined = 1883
Now the number of documents examined corresponds to the number of elements examined in the index.
Now the number of documents examined corresponds to the number of elements examined in the index.
We have optimized the query.
Example 2
Now we would like to examine a query to find out all the restaurants with Italian cuisine that received a grade score of greater than 50.
We have a COLLSCAN again. Let's try to improve the query creating an index on the cuisine field.
The query has improved. The created index cuisine_1 is used, but still, we have 325documents examined with only 6 documents returned. Let's see if we can do better by creating instead a compound index that uses both the fields in the condition: cuisine and grades.score.
Now, the winning plan uses the new compound index cuisine_1_grades.score_1and we have only 6 documents examined. Please note also that now we have a rejected plan, the one that uses the single field index cuisine_1 previously created.
We have optimized the query.
Example 3
Let's find out all the restaurants that don't prepare any "American" cuisine in Brooklyn and achieved a grade of "A." We want to see the results ordered by cuisine, descending.
At this point, you should be a little familiar with the explain() output, so in the next box, we truncate it for the sake of simplicity, leaving only the relevant parts.
Looking at the winning plan, we see that we have IXSCAN on the index cuisine_1. The important thing to notice here is the choice of the index on cuisine field-it's because we have used sort({cuisine:-1}). There is not a SORT stage because the documents are already extracted using the index, and so they are already sorted. Just notice, too, direction:"backward" — this is because we specified descending order in the query. If we try to execute a slightly different query, changing the sorting to name:1 instead of cuisine:-1, we'll see a completely different winning plan.
In this case, we have fewer documents examined but since the cuisine_1 index cannot be used, a SORT stage is needed, and the index used to fetch the document is borough_1. While MongoDB has examined fewer documents, the execution time is worse because of the extra stage used to sort the documents.
Let's return now to the original query. We can also notice that the number of documents examined is still too high (2517) compared to the documents returned (493). That's not optimal. Let's see if we can further improve the query by adding another compound index on (cuisine, borough, grades.grade).
Now, MongoDB uses the new index and does not need the extra sorting stage. The number of documents examined is the same as the number of documents returned. Also, the execution time is better.
We have optimized the query.
Example 4
This is the final example. Let's find out the restaurants where the grades array contains a grade of "A" and a score of 9 for a specific date.
A COLLSCAN again. In the query, we notice that all the conditions refer to embedded fields in an array object. So let's try to create a multikey index. Let's just a create the index on the date field only and see what happens.
MongoDB uses the index and the winning plan is already good enough. You can try as an exercise to create a compound index including other embedded fields of the array, like {"grades.date":1, "grades.grade":1, "grades.score":1} and see what happens. You will probably see that the index we have created only on date is good enough. Enlarging the compound index will generate only rejected plans. This is because the date field is the most selective.
Hint: when dealing with compound indexes, please remember that the order of the fields is important. The first field should be the most selective one, and the last one should be the less selective. Or, in cases where you don't need to put a lot of fields in the index, the most selective ones are probably all that you need to improve the queries.
Conclusion
That's all folks. In this two-part series, we have seen the indexes available in MongoDB and how to use explain() to investigate queries and find out how to improve their performance. We don't expect you to now know everything, but we hope this will be a good starting point for you to practice MongoDB query optimization using explain(). You'll find plenty more information in the manual and on the internet about indexes and explain() if you are motivated to find out more.
Thanks for reading, and if you have any thoughts or questions, feel free to write them in the comments!
Комментариев нет:
Отправить комментарий