SSTable format Block Based

Note

Block-Based format SSTable is the default format of RocksDB SSTable.

The full name of SSTable is Static Sorted Table which the basic file to store the data in RocksDB.

Some Basic Structure

The format of BlockHandle as below:

label type
offset varint64
size varint64

Top Level Structure

label type note
data blocks data_block[N] sorted order
meta blocks meta_block[K] filter, stats, compression dictionary, range deletion
metaindex blocks metaindex_block[K]  
index blocks index_block[N] N for one-level index, Not N for two-level index
footer footer fixed size

0. Data Block

Note

This Block is built by block_builder.cc in sources of RocksDB.

The data block format as below:

label type
groups group[groups_count]
groups_offset fixed32[groups_count]
groups_count fixed32
compress_type char
CRC32 fixed32

The data group format as below:

label type note
shared_bytes varint32 compress prefix of key, 0 for restart point
unshared_bytes varint32 unshared key bytes length
value_length varint32 value length
key_delta char[unshared_bytes] unshared key bytes
value char[value_length] value bytes

1. Meta Block

Note

The meta block is built by block_builder.cc in sources of RocksDB too.

1.0. Filter Meta Block

1. Full filter block for entire SSTable.
2. Partitioned filter for too big filter block.In this case, the are two-level of filter, the one is the top-level index for 2nd filter block, the other is the real of filter meta block.

1.1. Properties Block

label type
props K/V[P]

Default properties as below:

  • data size
  • index size
  • filter size
  • raw key size ; size of key before any process(such as compress etc.)
  • raw value size ; size of value before any process(such as compress etc.)
  • number of entries
  • number of data block

1.2. Compression Dictionary

Note

This only apply to bottommost level.

1.3. Range Deletion

Note

Can only be obsoleted during compaction to the bottommost level.

label note
User Key the range’s begin key
Sequence Number the sequence number at which range-deletion was inserted to the DB
Value Type kTypeRangeDeletion
Value the range’s end key

2. Meta Index

The entry of each metaindex block.

K : name of the metablock
V : BlockHandle point to corresponding metablock.
label type
metaindex blocks metaindex_block[K]

3. Index Block

3.0. One-Level

The entry of each data block.

K : string >= last key and before first key in successive data block.
V : BlockHandle
label type
index blocks index_block[N]

3.1. Two-Level

Note

If enable kTwoLevelIndexSearch

label type
1st index blocks  
2nd index blocks