Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove dataExpiration from createCollection (capped collection one) (mongo) #2159

Open
fgalan opened this issue May 3, 2022 · 8 comments
Open

Comments

@fgalan
Copy link
Member

fgalan commented May 3, 2022

https://github.com/telefonicaid/fiware-cygnus/blob/master/cygnus-common/src/main/java/com/telefonica/iot/cygnus/backends/mongo/MongoBackendImpl.java

    public void createCollection(String dbName, String collectionName, long collectionsSize, long maxDocuments,
            long dataExpiration) throws MongoException

The dataExpiration parameters (and associtted logic in the function) should be removed, as, according to MongoDB documentation on capped collections (https://www.mongodb.com/docs/manual/core/capped-collections/):

TTL indexes are not compatible with capped collections.

@fgalan
Copy link
Member Author

fgalan commented May 3, 2022

I understand that all the calls to that createCollection() method use dataExpiration == 0 (or errors would be happening) I would be a good idea to check it.

@AlvaroVega
Copy link
Member

AlvaroVega commented May 3, 2022

Cygnus is creating a capped collection just when:

if (collectionsSize != 0 && maxDocuments != 0) {
CreateCollectionOptions options = new CreateCollectionOptions()
.capped(true)
.sizeInBytes(collectionsSize)
.maxDocuments(maxDocuments);
LOGGER.debug("Creating Mongo collection=" + collectionName + " at database=" + dbName + " with "

I'm afraid that If just one of these options are provide then nothing is done

https://github.com/telefonicaid/fiware-cygnus/blob/master/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_mongo_sink.md#section2.1

data_expiration no 0 Collections will be removed if older than the value specified in seconds. The reference of time is the one stored in the recvTime property. Set to 0 if not wanting this policy.
collections_size no 0 The oldest data (according to insertion time) will be removed if the size of the data collection gets bigger than the value specified in bytes. Notice that the size-based truncation policy takes precedence over the time-based one. Set to 0 if not wanting this policy. Minimum value (different than 0) is 4096 bytes.
max_documents no 0 The oldest data (according to insertion time) will be removed if the number of documents in the data collections goes beyond the specified value. Set to 0 if not wanting this policy.

Maybe these documentation should be updated to reflect that both options should be provided with a value >0 to achieve that.

According with mongo driver java:

https://mongodb.github.io/mongo-java-driver/3.6/javadoc/com/mongodb/client/model/CreateCollectionOptions.html#sizeInBytes-long-

sizeInBytes(long sizeInBytes)
Gets the maximum size of in bytes of a capped collection

maxDocuments(long maxDocuments)
Sets the maximum number of documents allowed in a capped collection.

sizeInBytes does get?

@AlvaroVega
Copy link
Member

AlvaroVega commented May 3, 2022

I understand that all the calls to that createCollection() method use dataExpiration == 0 (or errors would be happening) I would be a good idea to check it.

But using this sink config

cygnus-ngsi.sinks.sth-sink.data_expiration = 10000
cygnus-ngsi.sinks.sth-sink.collections_size = 536870912
cygnus-ngsi.sinks.sth-sink.max_documents = 1000

No errors are reported when createCollection (capped + creaetIndex):

time=2022-05-03T14:21:30.986Z | lvl=DEBUG | corr=84f26ad1-8203-4e01-8506-dbed42cd5376; cbnotif=1 | trans=85b8945a-3c5f-49d6-8039-2323e239610d | srv=N/A | subsrv=N/A | comp=cygnus-ngsi | op=createCollection | msg=com.telefonica.iot.cygnus.backends.mongo.MongoBackendImpl[158] : Creating Mongo collection=sth_/device:dispAVG2_device at database=sth_smartcity with collections_size=100000 and max_documents=1000 options
time=2022-05-03T14:21:31.056Z | lvl=DEBUG | corr=84f26ad1-8203-4e01-8506-dbed42cd5376; cbnotif=1 | trans=85b8945a-3c5f-49d6-8039-2323e239610d | srv=N/A | subsrv=N/A | comp=cygnus-ngsi | op=insertContextDataAggregatedForResolution | msg=com.telefonica.iot.cygnus.backends.mongo.MongoBackendImpl[307] : Updating data, database=sth_smartcity, collection=sth
/_device:dispAVG2_device.aggr, queries=[{"_id": {"attrName": "TimeInstant", "origin": {"$date": 1651587660000}, "resolution": "second", "range": "minute"}, "points.offset": 30}, {"_id": {"attrName": "TimeInstant", "origin": {"$date": 1651586400000}, "resolution": "minute", "range": "hour"}, "points.offset": 21}, {"_id": {"attrName": "TimeInstant", "origin": {"$date": 1651536000000}, "resolution": "hour", "range": "day"}, "points.offset": 14}, {"_id": {"attrName": "TimeInstant", "origin": {"$date": 1651363200000}, "resolution": "day", "range": "month"}, "points.offset": 3}, {"_id": {"attrName": "TimeInstant", "origin": {"$date": 1640995200000}, "resolution": "month", "range": "year"}, "points.offset": 5}], updates=[{"$set": {"attrType": "DateTime"}, "$inc": {"points.30.samples": 1, "points.30.occur.2022-05-03T14:21:29.532Z": 1}}, {"$set": {"attrType": "DateTime"}, "$inc": {"points.21.samples": 1, "points.21.occur.2022-05-03T14:21:29.532Z": 1}}, {"$set": {"attrType": "DateTime"}, "$inc": {"points.14.samples": 1, "points.14.occur.2022-05-03T14:21:29.532Z": 1}}, {"$set": {"attrType": "DateTime"}, "$inc": {"points.2.samples": 1, "points.2.occur.2022-05-03T14:21:29.532Z": 1}}, {"$set": {"attrType": "DateTime"}, "$inc": {"points.4.samples": 1, "points.4.occur.2022-05-03T14:21:29.532Z": 1}}]

@AlvaroVega
Copy link
Member

AlvaroVega commented May 3, 2022

According with https://www.mongodb.com/docs/manual/tutorial/expire-data/

To create a TTL index, use the db.collection.createIndex() method with the expireAfterSeconds option on a field whose value is either a date or an array that contains date values.

But currently cygnus is using:

options = new IndexOptions().name("cyg_agg_exp").expireAfter(dataExpiration, TimeUnit.SECONDS);

expireAfterSeconds != expireAfter
Is this a bug?
No: java driver uses expireAfter: https://mongodb.github.io/mongo-java-driver/3.6/javadoc/com/mongodb/client/model/IndexOptions.html#expireAfter-java.lang.Long-java.util.concurrent.TimeUnit-

@fgalan
Copy link
Member Author

fgalan commented May 4, 2022

No errors are reported when createCollection (capped + creaetIndex):

We have checked in a CB-Cygnus-MongoDB scenario that the expiration index is created without problems. An isolate test using only MongoDB and the mongo shell has been done also to confirm that.

Maybe there is a docu-bug in MongoDB official documentation. I have created the issue https://jira.mongodb.org/browse/DOCS-15313 trying to clarify it with MongoDB team.

I suggest to keep this issue on hold while we get feedback from MongoDB team.

@fgalan
Copy link
Member Author

fgalan commented May 4, 2022

With regards to the block:

            if (collectionsSize != 0 && maxDocuments != 0) {
                CreateCollectionOptions options = new CreateCollectionOptions()
                        .capped(true)
                        .sizeInBytes(collectionsSize)
                        .maxDocuments(maxDocuments);
                LOGGER.debug("Creating Mongo collection=" + collectionName + " at database=" + dbName + " with "
                        + "collections_size=" + collectionsSize + " and max_documents=" + maxDocuments + " options");
                db.createCollection(collectionName, options);
            }

It's not correct, as both collectionSize and maxDocuments can be used idependently, according to MongoDB documentation at https://www.mongodb.com/docs/manual/reference/method/db.createCollection/#mongodb-method-db.createCollection:

imagen

So I'd suggest to use independent conditions and include a link to MongoDB documentation from Cygnus documentation in the case users want to know how to use them (e.g. which one takes preference over the other, etc.).

@AlvaroVega
Copy link
Member

related #2163

@fgalan
Copy link
Member Author

fgalan commented May 6, 2022

No errors are reported when createCollection (capped + creaetIndex):

We have checked in a CB-Cygnus-MongoDB scenario that the expiration index is created without problems. An isolate test using only MongoDB and the mongo shell has been done also to confirm that.

Maybe there is a docu-bug in MongoDB official documentation. I have created the issue https://jira.mongodb.org/browse/DOCS-15313 trying to clarify it with MongoDB team.

I suggest to keep this issue on hold while we get feedback from MongoDB team.

From MongoDB JIRA ticket:

My understanding is that we should update the language on the capped collection page to not make a blanket statement that TTL indexes are incompatible, but that the TTL index won't perform deletes.

Thus, it seems that TTL indexes can be created in capped collection, but they don't achieve the desire effect (in other words, they are useless).

So, the question is:

  1. Should Cygnus avoid creating this useless TTL index in this case?
  2. Should Cygnus stay as it is now creating the TTL index assuming that the user knows what happen in this situation?

In my opinion it's better option 1. Having an index that doesn't work as expected can be confusing for users without expertise in MongoDB.

@AlvaroVega AlvaroVega changed the title Remove dataExpiration from createCollection (capped collection one) Remove dataExpiration from createCollection (capped collection one) (mongo) Jun 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants