You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following code currently has getDataSize as an estimated value. The Iceberg rolling file write operation relies on this method, which may result in writing files that are much smaller than expected.
/** * @return the total size of data written to the file and buffered in memory */publiclonggetDataSize() {
returnlastRowGroupEndPos + columnStore.getBufferedSize();
}
Could we provide a potentially larger getDataSize? I can't think of any downsides at the moment.
Component(s)
No response
The text was updated successfully, but these errors were encountered:
dragongu
changed the title
Confusion in the implementation of InternalParquetRecordWriter
Could we provide a potentially larger InternalParquetRecordWriter.getDataSize
Oct 11, 2024
Do you have any concrete suggestion on what value to provide? My concern is that changing the behavior may affect a lot of downstream applications in the wild without notice.
The following code currently has getDataSize as an estimated value. The Iceberg rolling file write operation relies on this method, which may result in writing files that are much smaller than expected.
Could we provide a potentially larger getDataSize? I can't think of any downsides at the moment.
Component(s)
No response
The text was updated successfully, but these errors were encountered: