Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MongoDB indexes cyg_raw_opt could have worsened in newers versions #2291

Open
fgalan opened this issue Sep 12, 2023 · 9 comments
Open

MongoDB indexes cyg_raw_opt could have worsened in newers versions #2291

fgalan opened this issue Sep 12, 2023 · 9 comments

Comments

@fgalan
Copy link
Member

fgalan commented Sep 12, 2023

Based on the discussion on https://stackoverflow.com/questions/76570277/differences-in-query-logs-when-read-by-sth-comet-caused-by-cygnus-version-upgrad (I'd suggest to read it carefully) the indexes created by Cygnus at MongoDB could have worsened from 2.20.0 to 3.2.0

In 2.20.0:

> db['sth_/'].getIndexes()
[
        {
                "v" : 2,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_"
        },
        {
                "v" : 2,
                "key" : {
                        "entityId" : 1,
                        "entityType" : 1,
                        "attrName" : 1,
                        "recvTime" : 1
                },
                "name" : "cyg_raw_opt"
        },
        {
                "v" : 2,
                "key" : {
                        "recvTime" : 1
                },
                "name" : "cyg_raw_exp",
                "expireAfterSeconds" : NumberLong(1576800000)
        }
]

In 3.2.0:

rs0:PRIMARY> db['sth_/'].getIndexes()
[
    {
        "v" : 2,
        "key" : {
            "_id" : 1
        },
        "name" : "_id_"
    },
    {
        "v" : 2,
        "key" : {
            "entityId" : 1,
            "entityType" : 1,
            "attrName" : 1,
            "recvTime" : 1
        },
        "name" : "entityId_1_entityType_1_attrName_1_recvTime_1"
    },
    {
        "v" : 2,
        "key" : {
            "recvTime" : 1,
            "entityId" : 1,
            "entityType" : 1,
            "attrName" : 1,
            "attrType" : 1,
            "attrValue" : 1
        },
        "name" : "cyg_raw_opt"
    },
    {
        "v" : 2,
        "key" : {
            "recvTime" : 1
        },
        "name" : "cyg_raw_exp",
        "expireAfterSeconds" : NumberLong(1576800000)
    }
]

The entityId_1_entityType_1_attrName_1_recvTime_1 was added manually (the same as old cyg_raw_opt).

@fgalan
Copy link
Member Author

fgalan commented Sep 12, 2023

Based on the discussion on https://stackoverflow.com/questions/76570277/differences-in-query-logs-when-read-by-sth-comet-caused-by-cygnus-version-upgrad

Answer to that question at SOF should be edited when this issue gets addressed (maybe saying the new version in which the fix would be included).

@fgalan
Copy link
Member Author

fgalan commented Sep 12, 2023

Relevant cygnus configuration, from SOF question:

    environment:
      - CYGNUS_MONGO_HOSTS=mongo:27017
      - CYGNUS_API_PORT=5080
      - CYGNUS_MONGO_DATA_MODEL=dm-by-service-path
      - CYGNUS_STH_DATA_MODEL=dm-by-service-path

@Madhu1029
Copy link
Contributor

Hi @fgalan ,

I have tried to reproduce the issue. Index creation is reproduced successfully.
I could not find planSummary in Mongo logs. I have tried to check the scenario as below:

Prepared the mongo query as per the code https://github.com/telefonicaid/fiware-sth-comet/blob/2.10.0/lib/database/sthDatabase.js#L503 and then get the executionStats of the query.
In case of collection-per-service-path, below mongo query will be executed:
db["sth/testsink"].find({"entityId":"Room1","entityType":"Room","attrName":"temperature"}, {_id: 0,attrType: 1,attrValue: 1,recvTime: 1}).sort({ recvTime: -1 }).explain('executionStats');

winningPlan of above query is using index cyg_raw_opt index. So, it seems that index is correctly used by STH-Comet.

"stage" : "PROJECTION_SIMPLE",
                        "transformBy" : {
                                "_id" : 0,
                                "attrType" : 1,
                                "attrValue" : 1,
                                "recvTime" : 1
                        },
                        "inputStage" : {
                                "stage" : "FETCH",
                                "filter" : {
                                        "$and" : [
                                                {
                                                        "attrName" : {
                                                                "$eq" : "temperature"
                                                        }
                                                },
                                                {
                                                        "entityId" : {
                                                                "$eq" : "Room1"
                                                        }
                                                },
                                                {
                                                        "entityType" : {
                                                                "$eq" : "Room"
                                                        }
                                                }
                                        ]
                                },
                                "inputStage" : {
                                        "stage" : "IXSCAN",
                                        "keyPattern" : {
                                                "recvTime" : 1,
                                                "entityId" : 1,
                                                "entityType" : 1,
                                                "attrName" : 1,
                                                "attrType" : 1,
                                                "attrValue" : 1
                                        },
                                        "indexName" : "cyg_raw_opt",
                                        "isMultiKey" : false,
                                        "multiKeyPaths" : {
                                                "recvTime" : [ ],
                                                "entityId" : [ ],
                                                "entityType" : [ ],
                                                "attrName" : [ ],
                                                "attrType" : [ ],
                                                "attrValue" : [ ]
                                        },

As per my understanding, index is correctly used. Please confirm my understanding.

@fgalan
Copy link
Member Author

fgalan commented Oct 17, 2023

@ctc-watanabe as you were the one that raised this issue, could you have a look to @MadhuNEC comment and provide feedback, please? Thanks in advance!

@fgalan
Copy link
Member Author

fgalan commented Oct 17, 2023

Note PR #2314 has been recently merged.

Not sure if this PR could be related with this issue... please take it into account.

@ctc-watanabe
Copy link

@fgalan
Thank you for making contact.

I have investigated on the master branch, but it seems that the problem has not been resolved.
As @MadhuNEC says, indexes are used for the FIND process performed by STH-Comet.
However, the process that no longer uses the index is the COUNT process that is executed before the FIND process, as shown in the log I described in the SOF.

// Log of STH-Comet query to get data written by Cygnus 2.20.0
{"t":{"$date":"2023-06-28T03:54:01.910+00:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn11","msg":"Slow query","attr":{"type":"command","ns":"sth_example.$cmd","command":{"listCollections":1,"filter":{"name":"sth_/"},"cursor":{},"nameOnly":true,"lsid":{"id":{"$uuid":"a7f6f881-1db3-42f0-86b6-c14fc3425419"}},"$db":"sth_example"},"numYields":0,"reslen":166,"locks":{"ParallelBatchWriterMode":{"acquireCount":{"r":1}},"FeatureCompatibilityVersion":{"acquireCount":{"r":1}},"ReplicationStateTransition":{"acquireCount":{"w":1}},"Global":{"acquireCount":{"r":1}},"Database":{"acquireCount":{"r":1}},"Collection":{"acquireCount":{"r":2}},"Mutex":{"acquireCount":{"r":1}}},"protocol":"op_msg","durationMillis":0}}
{"t":{"$date":"2023-06-28T03:54:01.916+00:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn11","msg":"Slow query","attr":{"type":"command","ns":"sth_example.sth_/","command":{"count":"sth_/","query":{"entityId":"Car1","entityType":"Car","attrName":"speed"},"skip":0,"limit":0,"$db":"sth_example"},"planSummary":"COUNT_SCAN { entityId: 1, entityType: 1, attrName: 1, recvTime: 1 }","keysExamined":3,"docsExamined":0,"numYields":0,"queryHash":"41C2776D","planCacheKey":"2D54FA2D","reslen":45,"locks":{"FeatureCompatibilityVersion":{"acquireCount":{"r":1}},"ReplicationStateTransition":{"acquireCount":{"w":1}},"Global":{"acquireCount":{"r":1}},"Database":{"acquireCount":{"r":1}},"Collection":{"acquireCount":{"r":1}},"Mutex":{"acquireCount":{"r":1}}},"storage":{},"protocol":"op_msg","durationMillis":0}}
{"t":{"$date":"2023-06-28T03:54:01.920+00:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn11","msg":"Slow query","attr":{"type":"command","ns":"sth_example.sth_/","command":{"find":"sth_/","filter":{"entityId":"Car1","entityType":"Car","attrName":"speed"},"sort":{"recvTime":-1},"limit":5,"returnKey":false,"showRecordId":false,"lsid":{"id":{"$uuid":"a7f6f881-1db3-42f0-86b6-c14fc3425419"}},"$db":"sth_example"},"planSummary":"IXSCAN { entityId: 1, entityType: 1, attrName: 1, recvTime: 1 }","keysExamined":2,"docsExamined":2,"cursorExhausted":true,"numYields":0,"nreturned":2,"queryHash":"C0A2C615","planCacheKey":"26B66C8B","reslen":380,"locks":{"FeatureCompatibilityVersion":{"acquireCount":{"r":1}},"ReplicationStateTransition":{"acquireCount":{"w":1}},"Global":{"acquireCount":{"r":1}},"Database":{"acquireCount":{"r":1}},"Collection":{"acquireCount":{"r":1}},"Mutex":{"acquireCount":{"r":1}}},"storage":{},"protocol":"op_msg","durationMillis":0}}

// Log of STH-Comet query to get data written by Cygnus 3.2.0
{"t":{"$date":"2023-06-28T03:43:44.631+00:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn11","msg":"Slow query","attr":{"type":"command","ns":"sth_example.$cmd","command":{"listCollections":1,"filter":{"name":"sth_/"},"cursor":{},"nameOnly":true,"lsid":{"id":{"$uuid":"8dde283d-302e-4374-aa83-a3c78a19080c"}},"$db":"sth_example"},"numYields":0,"reslen":166,"locks":{"ParallelBatchWriterMode":{"acquireCount":{"r":1}},"FeatureCompatibilityVersion":{"acquireCount":{"r":1}},"ReplicationStateTransition":{"acquireCount":{"w":1}},"Global":{"acquireCount":{"r":1}},"Database":{"acquireCount":{"r":1}},"Collection":{"acquireCount":{"r":2}},"Mutex":{"acquireCount":{"r":1}}},"protocol":"op_msg","durationMillis":0}}
{"t":{"$date":"2023-06-28T03:43:44.634+00:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn11","msg":"Slow query","attr":{"type":"command","ns":"sth_example.sth_/","command":{"count":"sth_/","query":{"entityId":"Car1","entityType":"Car","attrName":"speed"},"skip":0,"limit":0,"$db":"sth_example"},"planSummary":"COLLSCAN","keysExamined":0,"docsExamined":2,"numYields":0,"queryHash":"41C2776D","planCacheKey":"2D54FA2D","reslen":45,"locks":{"FeatureCompatibilityVersion":{"acquireCount":{"r":1}},"ReplicationStateTransition":{"acquireCount":{"w":1}},"Global":{"acquireCount":{"r":1}},"Database":{"acquireCount":{"r":1}},"Collection":{"acquireCount":{"r":1}},"Mutex":{"acquireCount":{"r":1}}},"storage":{},"protocol":"op_msg","durationMillis":0}}
{"t":{"$date":"2023-06-28T03:43:44.636+00:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn11","msg":"Slow query","attr":{"type":"command","ns":"sth_example.sth_/","command":{"find":"sth_/","filter":{"entityId":"Car1","entityType":"Car","attrName":"speed"},"sort":{"recvTime":-1},"limit":5,"returnKey":false,"showRecordId":false,"lsid":{"id":{"$uuid":"8dde283d-302e-4374-aa83-a3c78a19080c"}},"$db":"sth_example"},"planSummary":"IXSCAN { recvTime: 1, entityId: 1, entityType: 1, attrName: 1, attrType: 1, attrValue: 1 }","keysExamined":2,"docsExamined":2,"cursorExhausted":true,"numYields":0,"nreturned":2,"queryHash":"C0A2C615","planCacheKey":"26B66C8B","reslen":380,"locks":{"FeatureCompatibilityVersion":{"acquireCount":{"r":1}},"ReplicationStateTransition":{"acquireCount":{"w":1}},"Global":{"acquireCount":{"r":1}},"Database":{"acquireCount":{"r":1}},"Collection":{"acquireCount":{"r":1}},"Mutex":{"acquireCount":{"r":1}}},"storage":{},"protocol":"op_msg","durationMillis":0}}

Probably, it is the process of the following code in STH-Comet.
https://github.com/telefonicaid/fiware-sth-comet/blob/9ece83786b6ab7785416be29c3a89a551364a151/lib/database/sthDatabase.js#L510-L523

I checked the COUNT process with the following command, and it is still COUNT_SCAN in Cygnus 2.20.0.

> db.runCommand({explain: {count: 'sth_/',query: {"entityId":"Car1","entityType":"Car","attrName":"speed"}}})
{
        "queryPlanner" : {
                "plannerVersion" : 1,
                "namespace" : "sth_example.sth_/",
                "indexFilterSet" : false,
                "parsedQuery" : {
                        "$and" : [
                                {
                                        "attrName" : {
                                                "$eq" : "speed"
                                        }
                                },
                                {
                                        "entityId" : {
                                                "$eq" : "Car1"
                                        }
                                },
                                {
                                        "entityType" : {
                                                "$eq" : "Car"
                                        }
                                }
                        ]
                },
                "winningPlan" : {
                        "stage" : "COUNT",
                        "inputStage" : {
                                "stage" : "COUNT_SCAN",
                                "keyPattern" : {
                                        "entityId" : 1,
                                        "entityType" : 1,
                                        "attrName" : 1,
                                        "recvTime" : 1
                                },
                                "indexName" : "cyg_raw_opt",
                                "isMultiKey" : false,
                                "multiKeyPaths" : {
                                        "entityId" : [ ],
                                        "entityType" : [ ],
                                        "attrName" : [ ],
                                        "recvTime" : [ ]
                                },
                                "isUnique" : false,
                                "isSparse" : false,
                                "isPartial" : false,
                                "indexVersion" : 2,
                                "indexBounds" : {

However, in Cygnus 3.2.0 and master branch, it is COLLSCAN as follows.

> db.runCommand({explain: {count: 'sth_/',query: {"entityId":"Car1","entityType":"Car","attrName":"speed"}}})
{
        "queryPlanner" : {
                "plannerVersion" : 1,
                "namespace" : "sth_example.sth_/",
                "indexFilterSet" : false,
                "parsedQuery" : {
                        "$and" : [
                                {
                                        "attrName" : {
                                                "$eq" : "speed"
                                        }
                                },
                                {
                                        "entityId" : {
                                                "$eq" : "Car1"
                                        }
                                },
                                {
                                        "entityType" : {
                                                "$eq" : "Car"
                                        }
                                }
                        ]
                },
                "winningPlan" : {
                        "stage" : "COUNT",
                        "inputStage" : {
                                "stage" : "COLLSCAN",
                                "filter" : {
                                        "$and" : [
                                                {
                                                        "attrName" : {
                                                                "$eq" : "speed"
                                                        }
                                                },
                                                {
                                                        "entityId" : {
                                                                "$eq" : "Car1"
                                                        }
                                                },
                                                {
                                                        "entityType" : {
                                                                "$eq" : "Car"
                                                        }
                                                }
                                        ]
                                },
                                "direction" : "forward"
                        }
                },

@AlvaroVega
Copy link
Member

AlvaroVega commented Oct 18, 2023

After this PR #2205 indexed are created having into account DataModel (DM) type. Which DM is involve in this case ?

@ctc-watanabe
Copy link

In my test environment, I set "dm-by-service-path" for DM.

@AlvaroVega
Copy link
Member

AlvaroVega commented Oct 19, 2023

Then index for sth collection involves the following keys:

  • recvTime, entityId, entityType, attrName, attrType, attrValue for cyg_raw_opt index
  • entityId, entityType, attrName, resolution, origin for cyg_agg_exp index

Moreover, after PR #2314 indexes when expirates are recreated even when conflict due to datamodel has changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants