Releases: facebook/rocksdb
Releases · facebook/rocksdb
RocksDB 6.20.3
6.20.3 (2021-05-05)
Bug Fixes
- Fixed a bug where
GetLiveFiles()
output included a non-existent file called "OPTIONS-000000". Backups and checkpoints, which useGetLiveFiles()
, failed on DBs impacted by this bug. Read-write DBs were impacted when the latest OPTIONS file failed to write andfail_if_options_file_error == false
. Read-only DBs were impacted when no OPTIONS files existed.
6.20.2 (2021-04-23)
Bug Fixes
- Fixed a bug in handling file rename error in distributed/network file systems when the server succeeds but client returns error. The bug can cause CURRENT file to point to non-existing MANIFEST file, thus DB cannot be opened.
- Fixed a bug where ingested files were written with incorrect boundary key metadata. In rare cases this could have led to a level's files being wrongly ordered and queries for the boundary keys returning wrong results.
- Fixed a data race between insertion into memtables and the retrieval of the DB properties
rocksdb.cur-size-active-mem-table
,rocksdb.cur-size-all-mem-tables
, androcksdb.size-all-mem-tables
. - Fixed the false-positive alert when recovering from the WAL file. Avoid reporting "SST file is ahead of WAL" on a newly created empty column family, if the previous WAL file is corrupted.
Behavior Changes
- Due to the fix of false-postive alert of "SST file is ahead of WAL", all the CFs with no SST file (CF empty) will bypass the consistency check. We fixed a false-positive, but introduced a very rare true-negative which will be triggered in the following conditions: A CF with some delete operations in the last a few queries which will result in an empty CF (those are flushed to SST file and a compaction triggered which combines this file and all other SST files and generates an empty CF, or there is another reason to write a manifest entry for this CF after a flush that generates no SST file from an empty CF). The deletion entries are logged in a WAL and this WAL was corrupted, while the CF's log number points to the next WAL (due to the flush). Therefore, the DB can only recover to the point without these trailing deletions and cause the inconsistent DB status.
6.20.0 (2021-04-16)
Behavior Changes
ColumnFamilyOptions::sample_for_compression
now takes effect for creation of all block-based tables. Previously it only took effect for block-based tables created by flush.CompactFiles()
can no longer compact files from lower level to up level, which has the risk to corrupt DB (details: #8063). The validation is also added to all compactions.- Fixed some cases in which DB::OpenForReadOnly() could write to the filesystem. If you want a Logger with a read-only DB, you must now set DBOptions::info_log yourself, such as using CreateLoggerFromOptions().
- get_iostats_context() will never return nullptr. If thread-local support is not available, and user does not opt-out iostats context, then compilation will fail. The same applies to perf context as well.
Bug Fixes
- Use thread-safe
strerror_r()
to get error messages. - Fixed a potential hang in shutdown for a DB whose
Env
has high-pri thread pool disabled (Env::GetBackgroundThreads(Env::Priority::HIGH) == 0
) - Made BackupEngine thread-safe and added documentation comments to clarify what is safe for multiple BackupEngine objects accessing the same backup directory.
- Fixed crash (divide by zero) when compression dictionary is applied to a file containing only range tombstones.
- Fixed a backward iteration bug with partitioned filter enabled: not including the prefix of the last key of the previous filter partition in current filter partition can cause wrong iteration result.
- Fixed a bug that allowed
DBOptions::max_open_files
to be set with a non-negative integer withColumnFamilyOptions::compaction_style = kCompactionStyleFIFO
. - Fixed a bug in handling file rename error in distributed/network file systems when the server succeeds but client returns error. The bug can cause CURRENT file to point to non-existing MANIFEST file, thus DB cannot be opened.
- Fixed a data race between insertion into memtables and the retrieval of the DB properties
rocksdb.cur-size-active-mem-table
,rocksdb.cur-size-all-mem-tables
, androcksdb.size-all-mem-tables
.
Performance Improvements
- On ARM platform, use
yield
instead ofwfe
to relax cpu to gain better performance.
Public API change
- Added
TableProperties::slow_compression_estimated_data_size
andTableProperties::fast_compression_estimated_data_size
. WhenColumnFamilyOptions::sample_for_compression > 0
, they estimate whatTableProperties::data_size
would have been if the "fast" or "slow" (seeColumnFamilyOptions::sample_for_compression
API doc for definitions) compression had been used instead. - Update DB::StartIOTrace and remove Env object from the arguments as its redundant and DB already has Env object that is passed down to IOTracer::StartIOTrace
- Added
FlushReason::kWalFull
, which is reported when a memtable is flushed due to the WAL reaching its size limit; those flushes were previously reported asFlushReason::kWriteBufferManager
. Also, changed the reason for flushes triggered by the write buffer manager toFlushReason::kWriteBufferManager
; they were previously reported asFlushReason::kWriteBufferFull
. - Extend file_checksum_dump ldb command and DB::GetLiveFilesChecksumInfo API for IntegratedBlobDB and get checksum of blob files along with SST files.
New Features
- Added the ability to open BackupEngine backups as read-only DBs, using BackupInfo::name_for_open and env_for_open provided by BackupEngine::GetBackupInfo() with include_file_details=true.
- Added BackupEngine support for integrated BlobDB, with blob files shared between backups when table files are shared. Because of current limitations, blob files always use the kLegacyCrc32cAndFileSize naming scheme, and incremental backups must read and checksum all blob files in a DB, even for files that are already backed up.
- Added an optional output parameter to BackupEngine::CreateNewBackup(WithMetadata) to return the BackupID of the new backup.
- Added BackupEngine::GetBackupInfo / GetLatestBackupInfo for querying individual backups.
- Made the Ribbon filter a long-term supported feature in terms of the SST schema(compatible with version >= 6.15.0) though the API for enabling it is expected to change.
RocksDB 6.19.3
6.19.3 (2021-04-19)
Bug Fixes
- Fixed a bug in handling file rename error in distributed/network file systems when the server succeeds but client returns error. The bug can cause CURRENT file to point to non-existing MANIFEST file, thus DB cannot be opened.
6.19.2 (2021-04-08)
Bug Fixes
- Fixed a backward iteration bug with partitioned filter enabled: not including the prefix of the last key of the previous filter partition in current filter partition can cause wrong iteration result.
6.19.1 (2021-04-01)
Bug Fixes
- Fixed crash (divide by zero) when compression dictionary is applied to a file containing only range tombstones.
6.19.0 (2021-03-21)
Bug Fixes
- Fixed the truncation error found in APIs/tools when dumping block-based SST files in a human-readable format. After fix, the block-based table can be fully dumped as a readable file.
- When hitting a write slowdown condition, no write delay (previously 1 millisecond) is imposed until
delayed_write_rate
is actually exceeded, with an initial burst allowance of 1 millisecond worth of bytes. Also, beyond the initial burst allowance,delayed_write_rate
is now more strictly enforced, especially with multiple column families.
Public API change
- Changed default
BackupableDBOptions::share_files_with_checksum
totrue
and deprecatedfalse
because of potential for data loss. Note that accepting this change in behavior can temporarily increase backup data usage because files are not shared between backups using the two different settings. Also removed obsolete option kFlagMatchInterimNaming. - Add a new option BlockBasedTableOptions::max_auto_readahead_size. RocksDB does auto-readahead for iterators on noticing more than two reads for a table file if user doesn't provide readahead_size. The readahead starts at 8KB and doubles on every additional read upto max_auto_readahead_size and now max_auto_readahead_size can be configured dynamically as well. Found that 256 KB readahead size provides the best performance, based on experiments, for auto readahead. Experiment data is in PR #3282. If value is set 0 then no automatic prefetching will be done by rocksdb. Also changing the value will only affect files opened after the change.
- Add suppport to extend DB::VerifyFileChecksums API to also verify blob files checksum.
- When using the new BlobDB, the amount of data written by flushes/compactions is now broken down into table files and blob files in the compaction statistics; namely, Write(GB) denotes the amount of data written to table files, while Wblob(GB) means the amount of data written to blob files.
- New default BlockBasedTableOptions::format_version=5 to enable new Bloom filter implementation by default, compatible with RocksDB versions >= 6.6.0.
- Add new SetBufferSize API to WriteBufferManager to allow dynamic management of memory allotted to all write buffers. This allows user code to adjust memory monitoring provided by WriteBufferManager as process memory needs change datasets grow and shrink.
- Clarified the required semantics of Read() functions in FileSystem and Env APIs. Please ensure any custom implementations are compliant.
- For the new integrated BlobDB implementation, compaction statistics now include the amount of data read from blob files during compaction (due to garbage collection or compaction filters). Write amplification metrics have also been extended to account for data read from blob files.
- Add EqualWithoutTimestamp() to Comparator.
- Extend support to track blob files in SSTFileManager whenever a blob file is created/deleted. Blob files will be scheduled to delete via SSTFileManager and SStFileManager will now take blob files in account while calculating size and space limits along with SST files.
- Add new Append and PositionedAppend API with checksum handoff to legacy Env.
New Features
- Support compaction filters for the new implementation of BlobDB. Add
FilterBlobByKey()
toCompactionFilter
. Subclasses can override this method so that compaction filters can determine whether the actual blob value has to be read during compaction. Use a newkUndetermined
inCompactionFilter::Decision
to indicated that further action is necessary for compaction filter to make a decision. - Add support to extend retrieval of checksums for blob files from the MANIFEST when checkpointing. During backup, rocksdb can detect corruption in blob files during file copies.
- Add new options for db_bench --benchmarks: flush, waitforcompaction, compact0, compact1.
- Add an option to BackupEngine::GetBackupInfo to include the name and size of each backed-up file. Especially in the presence of file sharing among backups, this offers detailed insight into backup space usage.
- Enable backward iteration on keys with user-defined timestamps.
- Add statistics and info log for error handler: counters for bg error, bg io error, bg retryable io error, auto resume count, auto resume total retry number, and auto resume sucess; Histogram for auto resume retry count in each recovery call. Note that, each auto resume attempt will have one or multiple retries.
Behavior Changes
- During flush, only WAL sync retryable IO error is mapped to hard error, which will stall the writes. When WAL is used but only SST file write has retryable IO error, it will be mapped to soft error and write will not be affected.
RocksDB 6.16.4
6.16.4 (2021-03-30)
Bug Fixes
- Fix build on ppc64 and musl build.
RocksDB 6.17.3
6.17.3 (2021-02-18)
Bug Fixes
- Fix
WRITE_PREPARED
,WRITE_UNPREPARED
TransactionDBMultiGet()
may return uncommitted data with snapshot.
6.17.2 (2021-02-05)
Bug Fixes
- Since 6.15.0,
TransactionDB
returns errorStatus
es from calls toDeleteRange()
and calls toWrite()
where theWriteBatch
contains a range deletion. Previously such operations may have succeeded while not providing the expected transactional guarantees. There are certain cases where range deletion can still be used on such DBs; see the API doc onTransactionDB::DeleteRange()
for details. OptimisticTransactionDB
now returns errorStatus
es from calls toDeleteRange()
and calls toWrite()
where theWriteBatch
contains a range deletion. Previously such operations may have succeeded while not providing the expected transactional guarantees.
6.17.1 (2021-01-28)
Behavior Changes
- When retryable IO error occurs during compaction, it is mapped to soft error and set the BG error. However, auto resume is not called to clean the soft error since compaction will reschedule by itself. In this change, When retryable IO error occurs during compaction, BG error is not set. User will be informed the error via EventHelper.
6.17.0 (2021-01-15)
Behavior Changes
- When verifying full file checksum with
DB::VerifyFileChecksums()
, we now fail withStatus::InvalidArgument
if the name of the checksum generator used for verification does not match the name of the checksum generator used for protecting the file when it was created. - Since RocksDB does not continue write the same file if a file write fails for any reason, the file scope write IO error is treated the same as retryable IO error. More information about error handling of file scope IO error is included in
ErrorHandler::SetBGError
.
Bug Fixes
- Version older than 6.15 cannot decode VersionEdits
WalAddition
andWalDeletion
, fixed this by changing the encoded format of them to be ignorable by older versions. - Fix a race condition between DB startups and shutdowns in managing the periodic background worker threads. One effect of this race condition could be the process being terminated.
Public API Change
- Add a public API WriteBufferManager::dummy_entries_in_cache_usage() which reports the size of dummy entries stored in cache (passed to WriteBufferManager). Dummy entries are used to account for DataBlocks.
RocksDB 6.16.3
6.16.3 (2021-02-05)
Bug Fixes
- Since 6.15.0,
TransactionDB
returns errorStatus
es from calls toDeleteRange()
and calls toWrite()
where theWriteBatch
contains a range deletion. Previously such operations may have succeeded while not providing the expected transactional guarantees. There are certain cases where range deletion can still be used on such DBs; see the API doc onTransactionDB::DeleteRange()
for details. OptimisticTransactionDB
now returns errorStatus
es from calls toDeleteRange()
and calls toWrite()
where theWriteBatch
contains a range deletion. Previously such operations may have succeeded while not providing the expected transactional guarantees.
6.16.2 (2021-01-21)
Bug Fixes
- Fix a race condition between DB startups and shutdowns in managing the periodic background worker threads. One effect of this race condition could be the process being terminated.
6.16.1 (2021-01-20)
Bug Fixes
- Version older than 6.15 cannot decode VersionEdits
WalAddition
andWalDeletion
, fixed this by changing the encoded format of them to be ignorable by older versions.
6.16.0 (2020-12-18)
Behavior Changes
- Attempting to write a merge operand without explicitly configuring
merge_operator
now fails immediately, causing the DB to enter read-only mode. Previously, failure was deferred until themerge_operator
was needed by a user read or a background operation. - Since RocksDB does not continue write the same file if a file write fails for any reason, the file scope write IO error is treated the same as retryable IO error. More information about error handling of file scope IO error is included in
ErrorHandler::SetBGError
.
Bug Fixes
- Truncated WALs ending in incomplete records can no longer produce gaps in the recovered data when
WALRecoveryMode::kPointInTimeRecovery
is used. Gaps are still possible when WALs are truncated exactly on record boundaries; for complete protection, users should enabletrack_and_verify_wals_in_manifest
. - Fix a bug where compressed blocks read by MultiGet are not inserted into the compressed block cache when use_direct_reads = true.
- Fixed the issue of full scanning on obsolete files when there are too many outstanding compactions with ConcurrentTaskLimiter enabled.
- Fixed the logic of populating native data structure for
read_amp_bytes_per_bit
during OPTIONS file parsing on big-endian architecture. Without this fix, original code introduced in PR7659, when running on big-endian machine, can mistakenly store read_amp_bytes_per_bit (an uint32) in little endian format. Future access toread_amp_bytes_per_bit
will give wrong values. Little endian architecture is not affected. - Fixed prefix extractor with timestamp issues.
- Fixed a bug in atomic flush: in two-phase commit mode, the minimum WAL log number to keep is incorrect.
- Fixed a bug related to checkpoint in PR7789: if there are multiple column families, and the checkpoint is not opened as read only, then in rare cases, data loss may happen in the checkpoint. Since backup engine relies on checkpoint, it may also be affected.
New Features
- User defined timestamp feature supports
CompactRange
andGetApproximateSizes
. - Support getting aggregated table properties (kAggregatedTableProperties and kAggregatedTablePropertiesAtLevel) with DB::GetMapProperty, for easier access to the data in a structured format.
- Experimental option BlockBasedTableOptions::optimize_filters_for_memory now works with experimental Ribbon filter (as well as Bloom filter).
Public API Change
- Deprecated public but rarely-used FilterBitsBuilder::CalculateNumEntry, which is replaced with ApproximateNumEntries taking a size_t parameter and returning size_t.
- Added a new option
track_and_verify_wals_in_manifest
. Iftrue
, the log numbers and sizes of the synced WALs are tracked in MANIFEST, then during DB recovery, if a synced WAL is missing from disk, or the WAL's size does not match the recorded size in MANIFEST, an error will be reported and the recovery will be aborted. Note that this option does not work with secondary instance.
RocksDB 6.15.5
6.15.5 (2021-02-05)
Bug Fixes
- Since 6.15.0,
TransactionDB
returns errorStatus
es from calls toDeleteRange()
and calls toWrite()
where theWriteBatch
contains a range deletion. Previously such operations may have succeeded while not providing the expected transactional guarantees. There are certain cases where range deletion can still be used on such DBs; see the API doc onTransactionDB::DeleteRange()
for details. OptimisticTransactionDB
now returns errorStatus
es from calls toDeleteRange()
and calls toWrite()
where theWriteBatch
contains a range deletion. Previously such operations may have succeeded while not providing the expected transactional guarantees.
RocksDB 6.15.4
6.15.4 (2021-01-21)
Bug Fixes
- Fix a race condition between DB startups and shutdowns in managing the periodic background worker threads. One effect of this race condition could be the process being terminated.
6.15.3 (2021-01-07)
Bug Fixes
- For Java builds, fix errors due to missing compression library includes.
RocksDB 6.15.2
6.15.2 (2020-12-22)
Bug Fixes
- Fix failing RocksJava test compilation and add CI jobs
- Fix jemalloc compilation issue on macOS
- Fix build issues - compatibility with older gcc, older jemalloc libraries, docker warning when building i686 binaries
6.15.1 (2020-12-01)
Bug Fixes
- Truncated WALs ending in incomplete records can no longer produce gaps in the recovered data when
WALRecoveryMode::kPointInTimeRecovery
is used. Gaps are still possible when WALs are truncated exactly on record boundaries. - Fix a bug where compressed blocks read by MultiGet are not inserted into the compressed block cache when use_direct_reads = true.
6.15.0 (2020-11-13)
Bug Fixes
- Fixed a bug in the following combination of features: indexes with user keys (
format_version >= 3
), indexes are partitioned (index_type == kTwoLevelIndexSearch
), and some index partitions are pinned in memory (BlockBasedTableOptions::pin_l0_filter_and_index_blocks_in_cache
). The bug could cause keys to be truncated when read from the index leading to wrong read results or other unexpected behavior. - Fixed a bug when indexes are partitioned (
index_type == kTwoLevelIndexSearch
), some index partitions are pinned in memory (BlockBasedTableOptions::pin_l0_filter_and_index_blocks_in_cache
), and partitions reads could be mixed between block cache and directly from the file (e.g., withenable_index_compression == 1
andmmap_read == 1
, partitions that were stored uncompressed due to poor compression ratio would be read directly from the file via mmap, while partitions that were stored compressed would be read from block cache). The bug could cause index partitions to be mistakenly considered empty during reads leading to wrong read results. - Since 6.12, memtable lookup should report unrecognized value_type as corruption (#7121).
- Since 6.14, fix false positive flush/compaction
Status::Corruption
failure whenparanoid_file_checks == true
and range tombstones were written to the compaction output files. - Since 6.14, fix a bug that could cause a stalled write to crash with mixed of slowdown and no_slowdown writes (
WriteOptions.no_slowdown=true
). - Fixed a bug which causes hang in closing DB when refit level is set in opt build. It was because ContinueBackgroundWork() was called in assert statement which is a no op. It was introduced in 6.14.
- Fixed a bug which causes Get() to return incorrect result when a key's merge operand is applied twice. This can occur if the thread performing Get() runs concurrently with a background flush thread and another thread writing to the MANIFEST file (PR6069).
- Reverted a behavior change silently introduced in 6.14.2, in which the effects of the
ignore_unknown_options
flag (used in option parsing/loading functions) changed. - Reverted a behavior change silently introduced in 6.14, in which options parsing/loading functions began returning
NotFound
instead ofInvalidArgument
for option names not available in the present version. - Fixed MultiGet bugs it doesn't return valid data with user defined timestamp.
- Fixed a potential bug caused by evaluating
TableBuilder::NeedCompact()
beforeTableBuilder::Finish()
in compaction job. For example, theNeedCompact()
method ofCompactOnDeletionCollector
returned by built-inCompactOnDeletionCollectorFactory
requiresBlockBasedTable::Finish()
to return the correct result. The bug can cause a compaction-generated file not to be marked for future compaction based on deletion ratio. - Fixed a seek issue with prefix extractor and timestamp.
- Fixed a bug of encoding and parsing BlockBasedTableOptions::read_amp_bytes_per_bit as a 64-bit integer.
- Fixed the logic of populating native data structure for
read_amp_bytes_per_bit
during OPTIONS file parsing on big-endian architecture. Without this fix, original code introduced in PR7659, when running on big-endian machine, can mistakenly store read_amp_bytes_per_bit (an uint32) in little endian format. Future access toread_amp_bytes_per_bit
will give wrong values. Little endian architecture is not affected.
Public API Change
- Deprecate
BlockBasedTableOptions::pin_l0_filter_and_index_blocks_in_cache
andBlockBasedTableOptions::pin_top_level_index_and_filter
. These options still take effect until users migrate to the replacement APIs inBlockBasedTableOptions::metadata_cache_options
. Migration guidance can be found in the API comments on the deprecated options. - Add new API
DB::VerifyFileChecksums
to verify SST file checksum with corresponding entries in the MANIFEST if present. Current implementation requires scanning and recomputing file checksums.
Behavior Changes
- The dictionary compression settings specified in
ColumnFamilyOptions::compression_opts
now additionally affect files generated by flush and compaction to non-bottommost level. Previously those settings at most affected files generated by compaction to bottommost level, depending on whetherColumnFamilyOptions::bottommost_compression_opts
overrode them. Users who relied on dictionary compression settings inColumnFamilyOptions::compression_opts
affecting only the bottommost level can keep the behavior by moving their dictionary settings toColumnFamilyOptions::bottommost_compression_opts
and setting itsenabled
flag. - When the
enabled
flag is set inColumnFamilyOptions::bottommost_compression_opts
, those compression options now take effect regardless of the value inColumnFamilyOptions::bottommost_compression
. Previously, those compression options only took effect whenColumnFamilyOptions::bottommost_compression != kDisableCompressionOption
. Now, they additionally take effect whenColumnFamilyOptions::bottommost_compression == kDisableCompressionOption
(such a setting causes bottommost compression type to fall back toColumnFamilyOptions::compression_per_level
if configured, and otherwise fall back toColumnFamilyOptions::compression
).
New Features
- An EXPERIMENTAL new Bloom alternative that saves about 30% space compared to Bloom filters, with about 3-4x construction time and similar query times is available using NewExperimentalRibbonFilterPolicy.
RocksDB 6.14.6
6.14.6 (2020-12-01)
Bug Fixes
- Truncated WALs ending in incomplete records can no longer produce gaps in the recovered data when
WALRecoveryMode::kPointInTimeRecovery
is used. Gaps are still possible when WALs are truncated exactly on record boundaries.
RocksDB 6.14.5
6.14.5 (2020-11-15)
Bug Fixes
- Fix a bug of encoding and parsing BlockBasedTableOptions::read_amp_bytes_per_bit as a 64-bit integer.
6.14.4 (2020-11-05)
Bug Fixes
Fixed a potential bug caused by evaluating TableBuilder::NeedCompact()
before TableBuilder::Finish()
in compaction job. For example, the NeedCompact()
method of CompactOnDeletionCollector
returned by built-in CompactOnDeletionCollectorFactory
requires BlockBasedTable::Finish()
to return the correct result. The bug can cause a compaction-generated file not to be marked for future compaction based on deletion ratio.
6.14.3 (2020-10-30)
Bug Fixes
- Reverted a behavior change silently introduced in 6.14.2, in which the effects of the
ignore_unknown_options
flag (used in option parsing/loading functions) changed. - Reverted a behavior change silently introduced in 6.14, in which options parsing/loading functions began returning
NotFound
instead ofInvalidArgument
for option names not available in the present version.
6.14.2 (2020-10-21)
Bug Fixes
- Fixed a bug which causes hang in closing DB when refit level is set in opt build. It was because ContinueBackgroundWork() was called in assert statement which is a no op. It was introduced in 6.14.
6.14.1 (2020-10-13)
Bug Fixes
- Since 6.12, memtable lookup should report unrecognized value_type as corruption (#7121).
- Since 6.14, fix false positive flush/compaction
Status::Corruption
failure whenparanoid_file_checks == true
and range tombstones were written to the compaction output files. - Fixed a bug in the following combination of features: indexes with user keys (
format_version >= 3
), indexes are partitioned (index_type == kTwoLevelIndexSearch
), and some index partitions are pinned in memory (BlockBasedTableOptions::pin_l0_filter_and_index_blocks_in_cache
). The bug could cause keys to be truncated when read from the index leading to wrong read results or other unexpected behavior. - Fixed a bug when indexes are partitioned (
index_type == kTwoLevelIndexSearch
), some index partitions are pinned in memory (BlockBasedTableOptions::pin_l0_filter_and_index_blocks_in_cache
), and partitions reads could be mixed between block cache and directly from the file (e.g., withenable_index_compression == 1
andmmap_read == 1
, partitions that were stored uncompressed due to poor compression ratio would be read directly from the file via mmap, while partitions that were stored compressed would be read from block cache). The bug could cause index partitions to be mistakenly considered empty during reads leading to wrong read results.
6.14 (2020-10-09)
Bug fixes
- Fixed a bug after a
CompactRange()
withCompactRangeOptions::change_level
set fails due to a conflict in the level change step, which caused all subsequent calls toCompactRange()
withCompactRangeOptions::change_level
set to incorrectly fail with aStatus::NotSupported("another thread is refitting")
error. - Fixed a bug that the bottom most level compaction could still be a trivial move even if
BottommostLevelCompaction.kForce
orkForceOptimized
is set.
Public API Change
- The methods to create and manage EncrypedEnv have been changed. The EncryptionProvider is now passed to NewEncryptedEnv as a shared pointer, rather than a raw pointer. Comparably, the CTREncryptedProvider now takes a shared pointer, rather than a reference, to a BlockCipher. CreateFromString methods have been added to BlockCipher and EncryptionProvider to provide a single API by which different ciphers and providers can be created, respectively.
- The internal classes (CTREncryptionProvider, ROT13BlockCipher, CTRCipherStream) associated with the EncryptedEnv have been moved out of the public API. To create a CTREncryptionProvider, one can either use EncryptionProvider::NewCTRProvider, or EncryptionProvider::CreateFromString("CTR"). To create a new ROT13BlockCipher, one can either use BlockCipher::NewROT13Cipher or BlockCipher::CreateFromString("ROT13").
- The EncryptionProvider::AddCipher method has been added to allow keys to be added to an EncryptionProvider. This API will allow future providers to support multiple cipher keys.
- Add a new option "allow_data_in_errors". When this new option is set by users, it allows users to opt-in to get error messages containing corrupted keys/values. Corrupt keys, values will be logged in the messages, logs, status etc. that will help users with the useful information regarding affected data. By default value of this option is set false to prevent users data to be exposed in the messages so currently, data will be redacted from logs, messages, status by default.
- AdvancedColumnFamilyOptions::force_consistency_checks is now true by default, for more proactive DB corruption detection at virtually no cost (estimated two extra CPU cycles per million on a major production workload). Corruptions reported by these checks now mention "force_consistency_checks" in case a false positive corruption report is suspected and the option needs to be disabled (unlikely). Since existing column families have a saved setting for force_consistency_checks, only new column families will pick up the new default.
General Improvements
- The settings of the DBOptions and ColumnFamilyOptions are now managed by Configurable objects (see New Features). The same convenience methods to configure these options still exist but the backend implementation has been unified under a common implementation.
New Features
- Methods to configure serialize, and compare -- such as TableFactory -- are exposed directly through the Configurable base class (from which these objects inherit). This change will allow for better and more thorough configuration management and retrieval in the future. The options for a Configurable object can be set via the ConfigureFromMap, ConfigureFromString, or ConfigureOption method. The serialized version of the options of an object can be retrieved via the GetOptionString, ToString, or GetOption methods. The list of options supported by an object can be obtained via the GetOptionNames method. The "raw" object (such as the BlockBasedTableOption) for an option may be retrieved via the GetOptions method. Configurable options can be compared via the AreEquivalent method. The settings within a Configurable object may be validated via the ValidateOptions method. The object may be intialized (at which point only mutable options may be updated) via the PrepareOptions method.
- Introduce options.check_flush_compaction_key_order with default value to be true. With this option, during flush and compaction, key order will be checked when writing to each SST file. If the order is violated, the flush or compaction will fail.
- Added is_full_compaction to CompactionJobStats, so that the information is available through the EventListener interface.
- Add more stats for MultiGet in Histogram to get number of data blocks, index blocks, filter blocks and sst files read from file system per level.