LevelDB 源码解析之 Cache 缓存-伙伴云

LevelDB 源码解析之 Cache 缓存

网友投稿 767 2022-05-29

GitHub: https://github.com/storagezhang

Emai: debugzhang@163.com

LevelDB: https://github.com/google/leveldb

LevelDB 源码解析之 Cache 缓存

影响数据库读性能的一个重要组件是缓存。如果 LevelDB 的每一次读取都会带来一次磁盘 IO，这些磁盘 IO 就会让系统的整体效率非常低下。

所以，LevelDB 使用了一种基于 LRUCache 的缓存机制，用于缓存：

已打开的 sstable 文件对象和相关元数据；

sstable 中的 dataBlock 的内容。

这种缓存机制使得对热数据的读取尽量在 cache 中命中，避免 IO 调用。

cache 类

Cache 是一个抽象类，声明了一个全局函数：

Cache* NewLRUCache(size_t capacity) { return new ShardedLRUCache(capacity); }

该函数用来创建具有固定容量的缓存，此缓存实现使用最近最少使用的淘汰策略（LRU, least-recently-used），当数据所占内存达到一定阈值时，删掉最近最少使用的数据。

Cache 中的数据是文件的副本，其读取流程为：

在 cache 中查找特定的 key，若找到该 key，直接返回对应的 handle，若未找到，执行第 2 步；

从文件中读取 key 对应的 entry 到 cache 中，再返回相应的 handle。

接口

Cache 声明了一系列接口：

virtual ~Cache();

通过调用 deleter 删除所有的 entry。

struct Handle {};

缓存中所存储的项 entry 对应的句柄 handle；

除了保存 key-value 数据外还保存了一些维护信息；

Handle 是通用接口，此处仅定义空结构体。

virtual Handle* Insert(const Slice& key, void* value, size_t charge, void (*deleter)(const Slice& key, void* value)) = 0;

将一个对应 key-value 的 entry 插入缓存；

返回 entry 对应的 handle，当不再需要返回的 handle，调用者必须调用 this->Release(handle)；

entry 被删除时，key 和 value 将会传递给 deleter。

virtual Handle* Lookup(const Slice& key) = 0;

如果当前缓存中没有对应 key 的 entry，返回 nullptr；

否则返回 entry 对应的 handle，当不再需要返回的 handle 时，调用者必须调用 this->Release(handle)。

virtual void Release(Handle* handle) = 0;

释放调用 this->Lookup 查找到的 handle：

对应 entry 的引用计数减 1；

若引用计数为 0，调用 deleter 删除该 entry。

注意：handle 必须尚未释放。

virtual void* Value(Handle* handle) = 0;

返回调用 this->Lookup(key) 查找到的 handle 中封装的值。

注意：handle 必须尚未释放。

virtual void Erase(const Slice& key) = 0;

从缓存删除 key 对应的 entry。

注意：entry 将一直保留，直到 entry 对应的所有 handle 都被 this->Release(handle) 释放。

virtual uint64_t NewId() = 0;

返回一个新的数字 id。

共享同一缓存的多个客户端可以使用该标识对保存 key 的空间进行分区。

通常客户端将在启动时分配一个新的 id，并预先将该 id 添加到它保存 key 的缓存中。

virtual void Prune() {}

删除所有不活跃的 entry。

内存受限的应用程序可能希望调用此方法以减少内存使用。

virtual size_t TotalCharge() const = 0;

返回缓存内存消耗的估计值。

LRUHandle 结构体

LRUHandle 是一个双向循环链表，存储缓存中的 entry，它的功能有：

存储 key-value 数据；

作为 LRU 链表，是一个可变长度的堆分配结构；

记录引用计数并负责清理。

struct LRUHandle { // 存储的 value 对象的指针，可以存储任意类型的值 void* value; // 当引用计数 `refs` 降为 0 时的清理函数 void (*deleter)(const Slice&, void* value); // 哈希表通过链地址法解决哈希冲突，发生哈希冲突时指向下一个 entry LRUHandle* next_hash; // LRU 链表双向指针 LRUHandle* next; LRUHandle* prev; // 分配的可以消耗的内存量 size_t charge; // key 的字节数 size_t key_length; // entry 是否存在缓存中 bool in_cache; // 引用计数，记录 entry 被不同缓存的引用，用于 entry 删除 uint32_t refs; // `key()` 对应的哈希，用于快速分区和比较 uint32_t hash; // 存储 key 的开始地址，结合 `key_length` 获取真正的 key // GCC 支持 C99 标准，允许定义 char key_data[] 这样的柔性数组（Flexible Array)。 // C++ 标准并不支持柔性数组的实现，这里定义为 key_data[1]，这也是 c++ 中的标准做法。 char key_data[1]; Slice key() const { // 只有当当前节点是空链表头时，`next` 才会等于 `this` // 链表是循环双向链表，空链表的头节点 next 和 prev 都指向自己构成环 // 链表头是冗余的 handle，不存储数据，只利用它的 next 和 prev assert(next != this); return Slice(key_data, key_length); } };

HandleTable 类

LevelDB 实现了一个内部哈希表 HandleTable，它消除了大量的移植漏洞，比一些编译器/运行时组合中的内置哈希表实现更快。例如，随机读的速度比 g++4.4.3 的内置哈希表提高了约 5%。

成员变量

每一个 LRUHandle 即是 HashTable 中的一个哈希桶，相同哈希值的 entry 通过 next_hash 连成链表，同时这些 entry 也通过 next 和 prev 连成为双向链表，最新插入的数据排在链表尾部。

class HandleTable { public: HandleTable() : length_(0), elems_(0), list_(nullptr) { Resize(); } ~HandleTable() { delete[] list_; } private: // list 数组的长度，即哈希桶的个数 uint32_t length_; // 存储的元素总个数，当 elems > length 时，扩展 list 数组并重新 hash uint32_t elems_; // 存储 entry 的数组，即哈希桶，初始大小为 4，动态扩展，成倍增长 LRUHandle** list_; } };

Resize

Resize 函数用于元素增加时动态扩展数组：

void Resize() { // 初始化时，哈希桶的个数为 4 uint32_t new_length = 4; // 随着不同哈希值的增加动态扩展，每次扩展为原容量的两倍 while (new_length < elems_) { new_length *= 2; } // 按照新容量申请内存，分配给新链表，并将原数据复制过去 LRUHandle** new_list = new LRUHandle*[new_length]; memset(new_list, 0, sizeof(new_list[0]) * new_length); uint32_t count = 0; for (uint32_t i = 0; i < length_; i++) { LRUHandle* h = list_[i]; while (h != nullptr) { LRUHandle* next = h->next_hash; uint32_t hash = h->hash; LRUHandle** ptr = &new_list[hash & (new_length - 1)]; h->next_hash = *ptr; *ptr = h; h = next; count++; } } assert(elems_ == count); // 释放原数组 delete[] list_; // 更新哈希表数据 list_ = new_list; length_ = new_length; }

LevelDB 只在 entry 增长的时候扩大 list_，没有缩小，这符合 LevelDB 的使用场景。

FindPointer

FindPointer 函数返回适合所传入的 key 和 hash 的插入位置：

LRUHandle** FindPointer(const Slice& key, uint32_t hash) { // list_[hash & (length_ - 1)] 中存储着 next_hash 的地址 // ptr 等于 next_hash 的地址 // *ptr 等于其指向的 next_hash，即指向 next_hash 所连成的链表 // **ptr 等于 next_hash 所指向的 LRUHandle，即指向具体的 entry LRUHandle** ptr = &list_[hash & (length_ - 1)]; // 利用 *ptr 遍历这个链表，直到找到 key 对应的 entry // 如果没有找到，ptr 就指向链表的尾部，这最后一个 next_hash 的值是 nullptr while (*ptr != nullptr && ((*ptr)->hash != hash || key != (*ptr)->key())) { ptr = &(*ptr)->next_hash; } // 返回符合条件的 next_hash 的地址 return ptr; }

Lookup，Insert 和 Remove

有了 FindPointer 返回的指针，查找函数 Lookup 直接读取指针指向的值:

LRUHandle* Lookup(const Slice& key, uint32_t hash) { return *FindPointer(key, hash); }

插入函数 Insert 直接在指针所指向的位置上插入：

LRUHandle* Insert(LRUHandle* h) { LRUHandle** ptr = FindPointer(h->key(), h->hash); LRUHandle* old = *ptr; h->next_hash = (old == nullptr ? nullptr : old->next_hash); *ptr = h; if (old == nullptr) { ++elems_; // 当存储元素个数大于哈希桶个数时，扩展哈希桶数组 // 因为每个缓存项都相当大，所以我们的目标是使链表的平均长度较小（<=1）。 if (elems_ > length_) { Resize(); } } return old; }

删除函数 Remove 直接删除指针所指向的 next_hash 所指向的 entry：

LRUHandle* Remove(const Slice& key, uint32_t hash) { LRUHandle** ptr = FindPointer(key, hash); LRUHandle* result = *ptr; if (result != nullptr) { *ptr = result->next_hash; --elems_; } return result; }

LRUCache 类

LRUCache 是 LRUCache 的内部实现，不对外暴露接口，作为 ShardedLRUCache 的分片存在。

成员变量

class LRUCache { private: // 缓存的容量，在初始化时指定 size_t capacity_; // 互斥锁，保护下列数据 mutable port::Mutex mutex_; // 缓存中所有 entry 的 charge 总和 // usage 必须小于 capacity size_t usage_ GUARDED_BY(mutex_); // 存储 entry 的循环双向链表，链表头冗余，同时维护使用热度 // lru.next 指向最旧的 entry，lru.prev 指向最新的 entry // 当缓存满了时，先从 lru.next 开始淘汰 entry // lru 保存 refs==1，并且 in_cache==true 的 entry LRUHandle lru_ GUARDED_BY(mutex_); // 存储正在使用的 entry 的循环双向链表，链表头冗余 // 这些 entry 不能被淘汰。 LRUHandle in_use_ GUARDED_BY(mutex_); // 保存所有 entry 的哈希表，用于快速查找数据 HandleTable table_ GUARDED_BY(mutex_); };

构造函数和析构函数

LRUCache::LRUCache() : capacity_(0), usage_(0) { // lru 和 in_use 都是循环双向链表 // 空链表的头节点 next 和 prev 都指向自己构成环，链表头冗余 lru_.next = &lru_; lru_.prev = &lru_; in_use_.next = &in_use_; in_use_.prev = &in_use_; }

LRUCache::~LRUCache() { // in_use 链表不能为空 assert(in_use_.next == &in_use_); for (LRUHandle* e = lru_.next; e != &lru_;) { LRUHandle* next = e->next; // entry 必须在缓存中 assert(e->in_cache); // 将 entry 移除缓存 e->in_cache = false; // entry 的引用计数必须为 1 assert(e->refs == 1); // 删除 entry Unref(e); e = next; } }

Cache 相关接口实现

Lookup

Cache::Handle* LRUCache::Lookup(const Slice& key, uint32_t hash) { MutexLock l(&mutex_); // 借助哈希表快速定位 entry LRUHandle* e = table_.Lookup(key, hash); if (e != nullptr) { // 如果找到，增加 entry 的引用计数 Ref(e); } return reinterpret_cast(e); } void LRUCache::Ref(LRUHandle* e) { if (e->refs == 1 && e->in_cache) { // 如果当前 entry 在 lru 链表中，需要将其移动到 in_use 链表，不需要等待淘汰 LRU_Remove(e); // 添加到链表头部 LRU_Append(&in_use_, e); } e->refs++; }

Release

void LRUCache::Release(Cache::Handle* handle) { MutexLock l(&mutex_); // 对 entry 解引用 Unref(reinterpret_cast(handle)); } void LRUCache::Unref(LRUHandle* e) { assert(e->refs > 0); // 引用计数减少 e->refs--; if (e->refs == 0) { // 当引用计数为 0 时进行清理 assert(!e->in_cache); (*e->deleter)(e->key(), e->value); free(e); } else if (e->in_cache && e->refs == 1) { // 当引用计数为 1 时移动到 lru 链表等待淘汰 LRU_Remove(e); // 添加到链表头部 LRU_Append(&lru_, e); } }

Insert

Cache::Handle* LRUCache::Insert(const Slice& key, uint32_t hash, void* value, size_t charge, void (*deleter)(const Slice& key, void* value)) { MutexLock l(&mutex_); LRUHandle* e = reinterpret_cast(malloc(sizeof(LRUHandle) - 1 + key.size())); e->value = value; e->deleter = deleter; e->charge = charge; e->key_length = key.size(); e->hash = hash; e->in_cache = false; // 返回 handle，引用计数加一 e->refs = 1; std::memcpy(e->key_data, key.data(), key.size()); if (capacity_ > 0) { // 加入缓存，引用计数加一 e->refs++; e->in_cache = true; LRU_Append(&in_use_, e); usage_ += charge; FinishErase(table_.Insert(e)); } else { // 不缓存（支持 capacity==0，这表示关闭缓存 // next 会影响 `key()` 中的断言，因此必须对其进行初始化 e->next = nullptr; } while (usage_ > capacity_ && lru_.next != &lru_) { // entry 的总消耗大于缓存的容量 LRUHandle* old = lru_.next; assert(old->refs == 1); bool erased = FinishErase(table_.Remove(old->key(), old->hash)); if (!erased) { // 在编译 NDEBUG 时避免使用未使用的变量 assert(erased); } } return reinterpret_cast(e); }

Erase

bool LRUCache::FinishErase(LRUHandle* e) { if (e != nullptr) { assert(e->in_cache); // 从 in_use 链表中删除 LRU_Remove(e); e->in_cache = false; usage_ -= e->charge; // 解引用 Unref(e); } return e != nullptr; } void LRUCache::Erase(const Slice& key, uint32_t hash) { MutexLock l(&mutex_); FinishErase(table_.Remove(key, hash)); }

Prune

void LRUCache::Prune() { MutexLock l(&mutex_); // 清空 lru 链表 while (lru_.next != &lru_) { LRUHandle* e = lru_.next; assert(e->refs == 1); bool erased = FinishErase(table_.Remove(e->key(), e->hash)); if (!erased) { // 在编译 NDEBUG 时避免使用未使用的变量 assert(erased); } } }

entry 的引用计数

LRUCache 中的 entry 可以分为四个状态：

图片来源：http://kaiyuan.me/2017/06/12/leveldb-05/

in use (ref=2)：

该 entry 在 table 中，并且在 in_use 链表中；由于该 entry 既被外部 handle 引用，也被 in_use 链表使用，因此有 ref=2；

Insert 函数创建一个新的 entry 保存相应的 key-value 数据，并加入 in_use 链表的表头，然后返回对应的 handle 到外部，并将 ref 设置为 2；

in lru (ref=1)：

该 entry 在 table 中，并且在 lru 链表中；由于该 entry 只被 lru 链表使用，因此有 ref=1；

Lookup 函数和 Release 函数成对出现。Lookup 查找 key 对应的 entry，并返回 entry 对应的 handle 给外部，增加一个引用， ref 加 1；Release 函数则相反，释放 handle，减少一个引用，ref 减 1；

not in lru, not in table (ref=1)：

该 entry 不在链表中也不在 table 中，但是仍然被外部 handle 引用且 handle 未释放，因此 ref=1；

Erase 函数将 entry 从 table 和 in_use 中删除，减少一个引用，ref 减 1，但是若此 entry 还被外部 handle 引用，不能直接释放 entry 占用的内存，必须等外部使用者调用 Release 函数之后才能释放内存；

not in lru, not in table (ref=0)：该 entry 不在链表中也不在 table 中，也不被外部使用，因此 ref=0；

ShardedLRUCache 类

ShardedLRUCache 是 LevelDB 对外暴露的 LRUCache。

static const int kNumShardBits = 4; static const int kNumShards = 1 << kNumShardBits; class ShardedLRUCache : public Cache { private: LRUCache shard_[kNumShards]; port::Mutex id_mutex_; uint64_t last_id_; public: explicit ShardedLRUCache(size_t capacity) : last_id_(0) { const size_t per_shard = (capacity + (kNumShards - 1)) / kNumShards; for (int s = 0; s < kNumShards; s++) { shard_[s].SetCapacity(per_shard); } } }

ShardedLRUCache 中定义了 16 个 LRUCache，每次进行插入或者查找时，使用哈希值的最高四位判断当前应该在哪个 LRUCache 中查找，然后在找到的 LRUCache 中进行相应的操作。

static inline uint32_t HashSlice(const Slice& s) { return Hash(s.data(), s.size(), 0); } static uint32_t Shard(uint32_t hash) { return hash >> (32 - kNumShardBits); }

因为 LRUCache 的查找、插入和释放均需要加锁，而这种分片的方式能够提高 LevelDB 对于缓存的并发访问，也就是从侧面降低了锁的粒度，提高访问效率。

Cache 相关接口实现

ShardedLRUCache 的接口实现非常简单，它自己只进行哈希值的计算，然后选择相应的 LRUCache，调用相应 LRUCache 的相关接口实现。

Handle* Insert(const Slice& key, void* value, size_t charge, void (*deleter)(const Slice& key, void* value)) override { const uint32_t hash = HashSlice(key); return shard_[Shard(hash)].Insert(key, hash, value, charge, deleter); } Handle* Lookup(const Slice& key) override { const uint32_t hash = HashSlice(key); return shard_[Shard(hash)].Lookup(key, hash); } void Release(Handle* handle) override { LRUHandle* h = reinterpret_cast(handle); shard_[Shard(h->hash)].Release(handle); } void Erase(const Slice& key) override { const uint32_t hash = HashSlice(key); shard_[Shard(hash)].Erase(key, hash); } void* Value(Handle* handle) override { return reinterpret_cast(handle)->value; } uint64_t NewId() override { MutexLock l(&id_mutex_); return ++(last_id_); } void Prune() override { for (int s = 0; s < kNumShards; s++) { shard_[s].Prune(); } } size_t TotalCharge() const override { size_t total = 0; for (int s = 0; s < kNumShards; s++) { total += shard_[s].TotalCharge(); } return total; }

存储数据库数据结构

数据结构的定义是什么（数据结构指的是什么）

767 2022-05-29

LevelDB 源码解析之 Cache 缓存

数据结构的定义是什么（数据结构指的是什么）

常用的数据结构有哪些呢（数据结构都有哪些主要结构）

程序员为什么要学数据结构？（程序员为什么要学数据结构）

推荐文章

企业生产管理是什么，企业生产管理软件

进盘点进销存软件排行榜前十名

进销存系统哪个简单好用？进销存系统优点

工厂生产管理（工厂生产管理流程及制度）

生产管理软件，机械制造业生产管理，制造业生产过程管理软件

进销存软件和ERP有什么区别？进销存与erp软件理解

进销存如何进行库存管理

如何利用excel制作销售订单管理系统？

数据库订单管理系统有哪些功能？数据库订单管理系统怎么设计？

什么是数据库管理系统？

最近发表

热评文章

零代码开发是什么？2022低代码平台排行榜

智能进销存库存管理系统（智慧进销存）

在线文档哪家强？8款在线文档编辑软件推荐

WPS2016怎么绘制简单的价格表?

家居定制平台是什么？

简单进销存库存管理盘点

友情链接

LevelDB 源码解析之 Cache 缓存

微信扫一扫：分享

推荐文章

最近发表

热评文章

友情链接