OC底层原理（六）：cache_t的分析

`cache_t`的本质

在类的方法调用过程中，已知过程是通过SEL(方法编号)在内存中查找IMP(方法指针)，为了使方法响应更加快速，效率更高，不需要每一次都去内存中把方法都遍历一遍，cache_t结构体出现了。cache_t将调用过的方法的SEL和IMP以及receiver以bucket_t结构体方式存储在当前类结构中，以便后续方法的查找。

结构图:

classDiagram
LGPerson --|> cache_t
bucket_t <|-- cache_t lgperson{ isa superclass cache bits } cache_t{ _buckets _mask _flags _occupied bucket_t{ _sel _imp < code>

图:

cache_t结构体

由结构图可优先探究下cache的类型cache_t，源码objc4-818.2中查看cache_t结构体

cache_t源代码:

struct cache_t {
private:
    explicit_atomic _bucketsAndMaybeMask; // 8bytes
    union {
        struct {
            explicit_atomic    _maybeMask; // 4bytes
#if __LP64__
            uint16_t                   _flags; // 2bytes
#endif
            uint16_t                   _occupied;// 2bytes
        };
        explicit_atomic _originalPreoptCache; // 8bytes
    };
   
    /*
     #if defined(__arm64__) && __LP64__
     #if TARGET_OS_OSX || TARGET_OS_SIMULATOR
     // __arm64__的模拟器
     #define CACHE_MASK_STORAGE CACHE_MASK_STORAGE_HIGH_16_BIG_ADDRS
     #else
     //__arm64__的真机
     #define CACHE_MASK_STORAGE CACHE_MASK_STORAGE_HIGH_16
     #endif
     #elif defined(__arm64__) && !__LP64__
     //32位 真机
     #define CACHE_MASK_STORAGE CACHE_MASK_STORAGE_LOW_4
     #else
     //macOS 模拟器
     #define CACHE_MASK_STORAGE CACHE_MASK_STORAGE_OUTLINED
     #endif
     ******  中间是不同的架构之间的判断 主要是用来不同类型 mask 和 buckets 的掩码
    */
    
    public:
    void incrementOccupied();
    void setBucketsAndMask(struct bucket_t *newBuckets, mask_t newMask);
    void reallocate(mask_t oldCapacity, mask_t newCapacity, bool freeOld);
    unsigned capacity() const;
    struct bucket_t *buckets() const;
    Class cls() const;
    void insert(SEL sel, IMP imp, id receiver);
    
    // 下面是基本上都是其他的方法的方法
 
};
复制代码

结论:

_bucketsAndMaybeMask变量uintptr_t占用8字节(bytes)和isa_t中的bits类似，也是一个指针类型里面存放地址
联合体里有一个结构体和一个结构体指针_originalPreoptCache
结构体中有三个成员变量 _maybeMask、_flags、_occupied。__LP64__指的是Unix和Unix类系统（Linux和macOS）
_originalPreoptCache和结构体是互斥的，_originalPreoptCache初始时候的缓存，现在探究类中的缓存，这个变量基本不会用到
cache_t提供了公用的方法去获取值，以及根据不同的架构系统去获取mask和buckets的掩码

在cache_t看到了buckets()，这个类似于class_data_bits_t里面的提供的methods()，都是通过方法获取值。

buckets()图:

`bucket_t`结构体

通过进入bucket_t结构体中查找流程

源代码:

struct bucket_t {
private:
    // IMP-first is better for arm64e ptrauth and no worse for arm64.
    // SEL-first is better for armv7* and i386 and x86_64.
#if __arm64__ //真机
    explicit_atomic _imp;
    explicit_atomic _sel;
#else
    explicit_atomic _sel;
    explicit_atomic _imp;
#endif
  ....
  //下面是方法省略
};
复制代码

结论:

bucket_t区分真机和其它，但是变量没变都是_sel和_imp只不过顺序不一样
bucket_t里面存的是_sel和_imp，cache里面缓存的应该是方法

cache_t 整体结构图

结构图:

classDiagram
objc_class --|> cache_t真机
objc_class --|> cache_t模拟器和macos
cache_t模拟器和macos --|> bucket_t非真机
cache_t真机 --|> bucket_t真机
cache_t真机 --|> _maskAndBuckets说明
cache_t真机 --|> cache_t中的mask和buckets
class objc_class{
Class ISA
Class superclass
cache_t cache
class_data_bits_t bits
}
class cache_t模拟器和macos{
struct bucket_t *_buckets
mask_t mask
uint16_t flags
uint16_t _occupied
}
class cache_t真机{
uintptr_t _bucketsAndMaybeMask
mask_t _maybeMask
uint16_t _flags
uint16_t _occupied

capactity()
bucket_t *buckets()
mask_t occupied()
void incrementOccupied()
void setBucketsAndMask()
void reallocate()
void insert()
}
class bucket_t非真机{
explicit_atomic_sel
explicit_atomic_imp
}
class bucket_t真机{
explicit_atomic_sel
explicit_atomic_imp
}
class _maskAndBuckets说明{
为了节省内存,读取方便mask和buckets存在一起
}
class cache_t中的mask和buckets{
maskShift = 48
maskZeroBits = 4
maxMask = ((uintptr_t)1 <<
(64 - maskShift)) - 1
static constexpr uintptr_t bucketsMask = ((uintptr_t)1<<
(maskShift - maskZeroBits)) - 1
}

图:

代码断点调试

创建LGPerson类，自定义一些实例方法，在main函数中创建LGPerson的实例化对象，然后进行lldb调试

代码:

#import 

@interface LGPerson : NSObject
@property (nonatomic, copy) NSString *name;
@property (nonatomic) int age;
@property (nonatomic, strong) NSString *hobby;

(void)saySomething;


(void)sayHappy;
@end

@implementation LGPerson

(instancetype)init{
  if (self = [super init]) {  self.name = @"Cooci";
 }
 return self;
}
(void)saySomething{
 NSLog(@”%s”,func);
}
(void)sayHappy{
 NSLog(@”LGPerson say : %s”,func);
}
@end
int main(int argc, const char * argv[]) {
 @autoreleasepool {
    LGPerson *p  = [LGPerson alloc];
    Class pClass = [LGPerson class];
    NSLog(@"%@",pClass);
&#125;
return 0;
}
复制代码

llvm调试:

(lldb) p/x pClass
(Class) $0 = 0x00000001000084f0 LGPerson
(lldb) p/x 0x00000001000084f0 + 0x10
(long) $1 = 0x0000000100008500
(lldb) p/x (cache_t *)$1
(cache_t *) $2 = 0x0000000100008500
(lldb) p *$2
(cache_t) $3 = {
  _bucketsAndMaybeMask = {
    std::__1::atomic = {
      Value = 4298515312
    }
  }
   = {
     = {
      _maybeMask = {
        std::__1::atomic = {
          Value = 0
        }
      }
      _flags = 32808
      _occupied = 0
    }
    _originalPreoptCache = {
      std::__1::atomic = {
        Value = 0x0000802800000000
      }
    }
  }
}
(lldb) p/x $3.buckets()
(bucket_t *) $4 = 0x0000000100362370
(lldb) p *$4
(bucket_t) $5 = {
  _sel = {
    std::__1::atomic = (null) {
      Value = (null)
    }
  }
  _imp = {
    std::__1::atomic = {
      Value = 0
    }
  }
}
(lldb) 
复制代码

图:

结论:

cache的变量的地址，需要首地址偏移16字节即0x10， cache的地址首地址+0x10
cache_t中的方法buckets()指向的是一块内存的首地址，也是第一个bucket的地址
p/x $3.buckets()[indx]的方式打印内存中其余的bucket发现_sel和imp
LGPerson对象没有调用对象方法，buckets中没有缓存方法的数据

在lldb中调用对象方法，[p sayHello]继续lldb调试

llvm:

(lldb) p [p saySomething] //调用了saySomething方法
2021-07-04 02:37:14.269170+0800 KCObjcBuild[26446:4843266] -[LGPerson saySomething]
(lldb) p *$2
(cache_t) $6 = {
  _bucketsAndMaybeMask = {
    std::__1::atomic = {
      Value = 4316269184
    }
  }
   = {
     = {
      _maybeMask = {
        std::__1::atomic = {
          Value = 7 //有值
        }
      }
      _flags = 32808
      _occupied = 1 //有值
    }
    _originalPreoptCache = {
      std::__1::atomic = {
        Value = 0x0001802800000007
      }
    }
  }
}
(lldb) p/x $6.buckets()
(bucket_t *) $7 = 0x0000000101450a80
(lldb) p *$7
(bucket_t) $8 = {
  _sel = {
    std::__1::atomic = (null) {
      Value = (null)
    }
  }
  _imp = {
    std::__1::atomic = {
      Value = 0
    }
  }
}
(lldb) p *($7+1)
(bucket_t) $9 = {
  _sel = {
    std::__1::atomic = (null) {
      Value = (null)
    }
  }
  _imp = {
    std::__1::atomic = {
      Value = 0
    }
  }
}
(lldb) p *($7+2)
(bucket_t) $10 = {
  _sel = {
    std::__1::atomic = (null) {
      Value = (null)
    }
  }
  _imp = {
    std::__1::atomic = {
      Value = 0
    }
  }
}
(lldb) p *($7+3)
(bucket_t) $11 = {
  _sel = {
    std::__1::atomic = "" {
      Value = ""
    }
  }
  _imp = {
    std::__1::atomic = {
      Value = 48416 //直到Value是正常地址值
    }
  }
}
(lldb) p $11.sel()//通过sel()方法获取SEL
(SEL) $12 = "saySomething"
(lldb) p $11.imp(nil,pClass)//通过imp(nil,类)方法获取imp
(IMP) $13 = 0x00000001000039d0 (KCObjcBuild`-[LGPerson saySomething])
复制代码

总结:

调用saySomething后，_mayMask和occupied被赋值，这两个变量应该和缓存是有关系
bucket_t结构提供了sel()和imp(nil,pClass)方法
saySomething方法的sel和imp，存在bucket中，存在cache中

脱离源码环境分析`cache`

通过上一个例子的lldb调试，基本弄清楚cache_t的结构。我们可以按照cache_t的代码结构模仿写一套，这样就不需要在源码环境下的通过lldb。如果需要调用方法，直接添加代码，重新运行就好，这是我们最熟悉的方式了。

代码:

LGPerson:

#import 
@interface LGPerson : NSObject
@property (nonatomic, copy) NSString *lgName;
@property (nonatomic, strong) NSString *nickName;


(void)say1;
(void)say2;
(void)say3;
(void)say4;
(void)say5;
(void)say6;
(void)say7;


(void)sayHappy;

@end
@implementation LGPerson

(void)say1{
  NSLog(@”LGPerson say : %s”,func);
}
(void)say2{
  NSLog(@”LGPerson say : %s”,func);
}
(void)say3{
  NSLog(@”LGPerson say : %s”,func);
}
(void)say4{
  NSLog(@”LGPerson say : %s”,func);
}
(void)say5{
  NSLog(@”LGPerson say : %s”,func);
}
(void)say6{
  NSLog(@”LGPerson say : %s”,func);
}
(void)say7{
  NSLog(@”LGPerson say : %s”,func);
}


(void)sayHappy{
  NSLog(@”LGPerson say : %s”,func);
}
@end

复制代码

main:

#import 
#import "LGPerson.h"
#import 

typedef uint32_t mask_t;  // x86_64 & arm64 asm are less efficient with 16-bits
struct kc_bucket_t {
    SEL _sel;
    IMP _imp;
};
struct kc_cache_t {
    struct kc_bucket_t *_bukets; // 8
    mask_t    _maybeMask; // 4
    uint16_t  _flags;  // 2
    uint16_t  _occupied; // 2
};
struct kc_class_data_bits_t {
    uintptr_t bits;
};
// cache class
struct kc_objc_class {
    Class isa;//不可获取
    Class superclass;
    struct kc_cache_t cache;             // formerly cache pointer and vtable
    struct kc_class_data_bits_t bits;
};
int main(int argc, const char * argv[]) {
    @autoreleasepool {
        LGPerson *p  = [LGPerson alloc];
        Class pClass = p.class;  // objc_clas
        [p say1];
        [p say2];
        //[p say3];
        //[p say4];
        //[p say1];
        //[p say2];
        //[p say3];
    //[pClass sayHappy];
    struct kc_objc_class *kc_class = (__bridge struct kc_objc_class *)(pClass);
    NSLog(@"%hu - %u",kc_class->cache._occupied,kc_class->cache._maybeMask);
    // 0 - 8136976 count
    // 1 - 3
    // 1: 源码无法调试
    // 2: LLDB
    // 3: 小规模取样
    
    // 底层原理
    // a: 1-3 -> 1 - 7
    // b: (null) - 0x0 方法去哪???
    // c: 2 - 7 + say4 - 0xb850 + 没有类方法
    // d: NSObject 父类
    
    for (mask_t i = 0; i<kc_class->cache._maybeMask; i++) &#123;
        struct kc_bucket_t bucket = kc_class->cache._bukets[i];
        NSLog(@"%@ - %pf",NSStringFromSelector(bucket._sel),bucket._imp);
    &#125;
    NSLog(@"Hello, World!");
&#125;
return 0;
}
复制代码

llvm:

2021-07-04 13:27:58.629469+0800 003-cache_t脱离源码环境分析[27782:4884791] LGPerson say : -[LGPerson say1]
2021-07-04 13:28:08.029414+0800 003-cache_t脱离源码环境分析[27782:4884791] LGPerson say : -[LGPerson say2]
2021-07-04 13:28:08.029963+0800 003-cache_t脱离源码环境分析[27782:4884791] 2 - 3
2021-07-04 13:28:08.030417+0800 003-cache_t脱离源码环境分析[27782:4884791] say1 - 0xb858f
2021-07-04 13:28:08.030502+0800 003-cache_t脱离源码环境分析[27782:4884791] say2 - 0xb808f
2021-07-04 13:28:08.030545+0800 003-cache_t脱离源码环境分析[27782:4884791] (null) - 0x0f
复制代码

结论:

由于objc_class的Class ISA是继承objc_object，自定义的结构体kc_objc_class要手动添加Class ISA，不然代码转换会转换错误

在mainfunction里取消say3、say4的注释;

再看看llvm打印:

2021-07-04 13:47:14.016817+0800 003-cache_t脱离源码环境分析[28227:4896303] LGPerson say : -[LGPerson say1]
2021-07-04 13:47:19.322219+0800 003-cache_t脱离源码环境分析[28227:4896303] LGPerson say : -[LGPerson say2]
2021-07-04 13:47:19.322786+0800 003-cache_t脱离源码环境分析[28227:4896303] LGPerson say : -[LGPerson say3]
2021-07-04 13:47:19.322873+0800 003-cache_t脱离源码环境分析[28227:4896303] LGPerson say : -[LGPerson say4]
2021-07-04 13:47:19.322941+0800 003-cache_t脱离源码环境分析[28227:4896303] 2 - 7
2021-07-04 13:47:19.323424+0800 003-cache_t脱离源码环境分析[28227:4896303] say4 - 0xb9b8f
2021-07-04 13:47:19.323499+0800 003-cache_t脱离源码环境分析[28227:4896303] (null) - 0x0f
2021-07-04 13:47:19.323593+0800 003-cache_t脱离源码环境分析[28227:4896303] say3 - 0xb9e8f
2021-07-04 13:47:19.323660+0800 003-cache_t脱离源码环境分析[28227:4896303] (null) - 0x0f
2021-07-04 13:47:19.323725+0800 003-cache_t脱离源码环境分析[28227:4896303] (null) - 0x0f
2021-07-04 13:47:19.323784+0800 003-cache_t脱离源码环境分析[28227:4896303] (null) - 0x0f
2021-07-04 13:47:19.323845+0800 003-cache_t脱离源码环境分析[28227:4896303] (null) - 0x0f
复制代码

结论:

_occupied和_maybeMask是作用？
say1和say2方法怎么消失了？
cache存储的位置怎么是乱序的呢？比如say4在最前面，第二与第四怎么是空的?
通过这个例子我们想要知道_occupied和_maybeMask是什么？只有去看源码，看看在什么地方赋值的。弄清楚缓存方法是怎么插入到buket中的。

`cache_t`源码探究

首先找到cache_t的方法缓存的入口insert(SEL sel, IMP imp, id receiver)，里面有参数sel和imp；而且还有方法名insert，看看它的具体实现，由于insert内的代码过多我们分步骤说明

obj-cache.mm中源代码:

void cache_t::insert(SEL sel, IMP imp, id receiver)
{
    runtimeLock.assertLocked();

// Never cache before +initialize is done
if (slowpath(!cls()->isInitialized())) &#123;
    return;
&#125;

if (isConstantOptimizedCache()) &#123;
    _objc_fatal("cache_t::insert() called with a preoptimized cache for %s",
                cls()->nameForLogging());
&#125;
#if DEBUG_TASK_THREADS
 return _collecting_in_critical();
#else
#if CONFIG_USE_CACHE_LOCK
 mutex_locker_t lock(cacheUpdateLock);
#endif
ASSERT(sel != 0 && cls()->isInitialized());
// Use the cache as-is if until we exceed our expected fill ratio.
mask_t newOccupied = occupied() + 1; // 1+1 occupied()获取当前的occupied，第一次进入occupied = 0, newOccupied = 1
unsigned oldCapacity = capacity(), capacity = oldCapacity;//容量的个数 第一次进入oldCapacity = 0, capacity = 0
if (slowpath(isConstantEmptyCache())) &#123; //缓存是否为空 occupied() == 0, 情况发生的概率小，只有第一次进入时会为0
    // Cache is read-only. Replace it.
    if (!capacity) capacity = INIT_CACHE_SIZE;//4 ,当capacity = 0, 1 << 2 -> 0100 = 4, capacity = 4 首次扩容是4
    reallocate(oldCapacity, capacity, /* freeOld */false);// oldCapacity = 0, capacity = 4, freeOld = false
&#125;
else if (fastpath(newOccupied + CACHE_END_MARKER <= cache_fill_ratio(capacity))) &#123; // newOccupied + 1 <= capacity * 3 / 4
    // Cache is less than 3/4 or 7/8 full. Use it as-is.
    //第一次会扩容 capacity = 4, 此时 newOccupied = 1, 1 + 1 <= (4 * 3 / 4 = 3)
    //第二次会扩容 capacity = 4, 此时 newOccupied = 2, 2 + 1 <= 3 不满足条件，走其他流程
    //第三次会扩容 capacity = 4, 此时 newOccupied = 3, 3 + 1 <= 3
    //如果缓存个数小于容量capacity * 3 / 4就什么都不用做，接着往后走
&#125;
#if CACHE_ALLOW_FULL_UTILIZATION //如果允许存满就是不留空位直接走下面流程
 else if (capacity <= FULL_UTILIZATION_CACHE_SIZE && newOccupied + CACHE_END_MARKER <= capacity) {
 // Allow 100% cache utilization for small buckets. Use it as-is.
 // (FULL_UTILIZATION_CACHE_SIZE = 1 << 3 = 8) && (newOccupied + 1 <= capacity)
 // 比如newOccupied = 7, capacity = 8, 7+1 <= 8满足条件，走后面的存储流程存满
 }
#endif
 else {// 4*2 = 8 容量超过了 3/4 的限制
 capacity = capacity ? capacity * 2 : INIT_CACHE_SIZE;//capacity有值进行2倍扩容,否则 capacity = 4
 if (capacity > MAX_CACHE_SIZE) {//判断 capacity > 2^(16-1) = 2^15 前面我探索了mask和buckets存在一起,其中mask的最大值就是2^15,联系起来了
 capacity = MAX_CACHE_SIZE; // 超过 capacity = 2^15
 }
 reallocate(oldCapacity, capacity, true); //如果超过容量的3/4就会重新开辟新内存 freeOld = true 是oldCapacity内存会被回收
 }
bucket_t *b = buckets(); //拿到第一个bucket的地址就是buckets()指向这块内存的首地址
mask_t m = capacity - 1; // 4-1=3 : mask = capacity - 1
mask_t begin = cache_hash(sel, m);//求哈希hash的下标index: 根据 sel 和 mask
mask_t i = begin;//开始位置

// Scan for the first unused slot and insert there.
// There is guaranteed to be an empty slot.
do &#123;
    if (fastpath(b[i].sel() == 0)) &#123;//如果当前的bucket是空时
        incrementOccupied();// _occupied ++ : 就是缓存一个bucket,_occupied就会加1, 意思就是占位, bucket的个数等于_occupied
        b[i].set<Atomic, Encoded>(b, sel, imp, cls());// 把sel和imp写入bucket,开始缓存方法
        return;
    &#125;
    if (b[i].sel() == sel) &#123;//如果缓存的buckets中已经有了方法就跳过
        // The entry was added to the cache by some other thread
        // before we grabbed the cacheUpdateLock.
        return;
    &#125;
&#125; while (fastpath((i = cache_next(i, m)) != begin));//如果存在hash冲突,hash冲突就是方法不一样但是下标一样,再次hash和begin比较不同就缓存

bad_cache(receiver, (SEL)sel);//坏的缓存
#endif // !DEBUG_TASK_THREADS
}
复制代码

分析`insert`

计算当前所占容量大小

insert计算容量图:

结论:

occupied()获取当前所占的容量，其实就是告诉你缓存中有几个bucket了
newOccupied = occupied() + 1，表示你是第几个进来缓存的
oldCapacity 目的是为了重新扩容的时候释放旧的内存

开辟容量

第一次进入扩容图:

结论:

只有第一次缓存方法的时，才会去开辟容量默认开辟容量是 capacity = INIT_CACHE_SIZE 即capacity = 4 就是4个bucket的内存大小
reallocate(oldCapacity, capacity, /* freeOld */false)开辟内存，freeOld变量控制是否释放旧的内存

reallocate方法探究

代码:

ALWAYS_INLINE
void cache_t::reallocate(mask_t oldCapacity, mask_t newCapacity, bool freeOld)
{
    bucket_t *oldBuckets = buckets();//获取oldBuckets的首地址
    bucket_t *newBuckets = allocateBuckets(newCapacity);//获取新开辟的newBuckets的首地址

// Cache's old contents are not propagated. 
// This is thought to save cache memory at the cost of extra cache fills.
// fixme re-measure this

ASSERT(newCapacity > 0);
ASSERT((uintptr_t)(mask_t)(newCapacity-1) == newCapacity-1);
//设置Buckets和Mash的值, Buckets存的是newBuckets的首地址, Mask存的是newCapacity - 1
//此时的 _occupied = 0因为是新开辟的
setBucketsAndMask(newBuckets, newCapacity - 1);
//如果freeold是true的话，释放回收旧的内存
if (freeOld) &#123;
    collect_free(oldBuckets, oldCapacity);
&#125;
}
复制代码

结论:

reallocate 方法主要做三件事:
1. allocateBuckets开辟内存
2. setBucketsAndMask设置mask和buckets的值
3. collect_free是否释放旧的内存，由freeOld控制

`allocateBuckets`方法探究

allocateBuckets源代码:

size_t cache_t::bytesForCapacity(uint32_t cap)
{
    return sizeof(bucket_t) * cap;//1. bucket_t大小 * cap
}

#if CACHE_END_MARKER // macOS 模拟器
bucket_t *cache_t::endMarker(struct bucket_t *b, uint32_t cap)
{
    return (bucket_t *)((uintptr_t)b + bytesForCapacity(cap)) - 1;//2. (首地址+开辟的内存) - 1: 获取最后一个位置的地址
}
bucket_t *cache_t::allocateBuckets(mask_t newCapacity)
{
    // Allocate one extra bucket to mark the end of the list.
    // This can’t overflow mask_t because newCapacity is a power of 2.
    bucket_t *newBuckets = (bucket_t *)calloc(bytesForCapacity(newCapacity), 1);//1.开辟 newCapacity * bucket_t 大小内存
bucket_t *end = endMarker(newBuckets, newCapacity);//2.获取最后一个位置的bucket的地址
#if arm
 // End marker’s sel is 1 and imp points BEFORE the first bucket.
 // This saves an instruction in objc_msgSend.
 end->set<NotAtomic, Raw>(newBuckets, (SEL)(uintptr_t)1, (IMP)(newBuckets - 1), nil);
#else
 // End marker’s sel is 1 and imp points to the first bucket.
 // 把最后一个位置的bucket的赋值 sel = 1 ,imp = 第一个bucket的地址,最后一个位置默认被占用
 end->set<NotAtomic, Raw>(newBuckets, (SEL)(uintptr_t)1, (IMP)newBuckets, nil);
#endif
if (PrintCaches) recordNewCache(newCapacity);//记录新的缓存

return newBuckets;
}
#else
复制代码

结论:

allocateBuckets方法主要做两件事:
calloc(bytesForCapacity(newCapacity), 1)开辟newCapacity * bucket_t 大小的内存
end->set将开辟内存的最后一个位置存入sel = 1，imp = 第一个buket位置的地址

`setBucketsAndMask`方法探究

源代码:

#if CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_OUTLINED

void cache_t::setBucketsAndMask(struct bucket_t *newBuckets, mask_t newMask)
{
    // objc_msgSend uses mask and buckets with no locks.
    // It is safe for objc_msgSend to see new buckets but old mask.
    // (It will get a cache miss but not overrun the buckets’ bounds).
    // It is unsafe for objc_msgSend to see old buckets and new mask.
    // Therefore we write new buckets, wait a lot, then write new mask.
    // objc_msgSend reads mask first, then buckets.
#ifdef arm //允许使用SUPPORT_MOD = 1 MOD运算符
    // ensure other threads see buckets contents before buckets pointer
    mega_barrier();//防止多线程同时访问
_bucketsAndMaybeMask.store((uintptr_t)newBuckets, memory_order_relaxed);

// ensure other threads see new buckets before new mask
mega_barrier();

_maybeMask.store(newMask, memory_order_relaxed);
_occupied = 0;
#elif x86_64 || i386 //macOS 和 模拟器
 // ensure other threads see buckets contents before buckets pointer
 //向_bucketsAndMaybeMask 写入数据
 _bucketsAndMaybeMask.store((uintptr_t)newBuckets, memory_order_release);//(uintptr_t)newBuckets是buckets()指向这块内存的首地址(也就是第一个buckets的内存)
// ensure other threads see new buckets before new mask
//向_maybeMask 写入数据
_maybeMask.store(newMask, memory_order_release);
_occupied = 0;
#else
#error Don’t know how to do setBucketsAndMask on this architecture.
#endif
}
复制代码

结论:

setBucketsAndMask主要根据不同的架构系统向_bucketsAndMaybeMask 和 _maybeMask写入数据

`collect_free`方法探究

collect_free源代码:

void cache_t::collect_free(bucket_t *data, mask_t capacity)
{
#if CONFIG_USE_CACHE_LOCK
    cacheUpdateLock.assertLocked();
#else
    runtimeLock.assertLocked();
#endif

if (PrintCaches) recordDeadCache(capacity);

_garbage_make_room ();//创建垃圾回收站
garbage_byte_size += cache_t::bytesForCapacity(capacity);//获取开辟内存的大小
garbage_refs[garbage_count++] = data;//将buckets的地址往后移
cache_t::collectNolock(false);//清空数据，回收内存
}
复制代码

结论:

collect_free主要是清空数据，回收内存

容量小于3/4

图：

结论:

当需要缓存的方法所占的容量总容量3/4是就会直接走缓存流程
苹果的设计思想，探究了很多底层就会发现，苹果做什么事情都会留有余地。一方面可能为了日后的优化或者扩展，另一方面可能是为了安全，内存对齐也是这样

容量存满

图:

结论:

苹果提供变量，很人性化，如果你需要把缓存的容量存满，默认是不存满的
个人建议不要存满，就按照默认的来，如果存满有可能出现其它的问题，很难去排查

容量超过3/4

图:

结论:

容量超过3/4，系统此时会进行两倍扩容，扩容的最大容量不会超过mask的最大值2^15
扩容的时候会进行一步重要的操作，开辟新的内存，释放回收旧的内存，此时的freeOld = true

缓存方法

图解:

结论:

首先拿到bucket()指向开辟这块内存首地址，也就是第一个bucket的地址，bucket()既不是数组也不是链表，只是一块连续的内存
hash函数根据缓存sel和mask，计算出hash下标。为什么需要mask呢？mask的实际作用是告诉系统你只能存前capacity - 1中的位置，比如capacity = 4时，缓存的方法只能存前面3个空位
开始缓存，当前的位置没有数据，就缓存该方法。如果该位置有方法且和你的方法一样的，说明该方法缓存过了，直接return。如果存在hash冲突，下标一样，sel不一样，此时会进行再次hash，冲突解决继续缓存

incrementOccupied

源代码:

void cache_t::incrementOccupied() 
{
    _occupied++;
}

复制代码

结论:

_occupied自动加1，_occupied表示内存中已经存储缓存方法的的个数

`cache_hash` 和 `cache_next`

cache_hash源代码:

static inline mask_t cache_hash(SEL sel, mask_t mask) 
{
    uintptr_t value = (uintptr_t)sel;
#if CONFIG_USE_PREOPT_CACHES //真机
    value ^= value >> 7;
#endif
    return (mask_t)(value & mask); //和mask进行一次与运算
}

复制代码

cache_next源代码:

#if CACHE_END_MARKER //__arm__  ||  __x86_64__  ||  __i386__
static inline mask_t cache_next(mask_t i, mask_t mask) {
    return (i+1) & mask;
}
#elif __arm64__ //真机
static inline mask_t cache_next(mask_t i, mask_t mask) {
    return i ? i-1 : mask;
}
#else
#error unexpected configuration
#endif

复制代码

结论:

cache_has主要是生成hash下标，cache_next主要是解决hash冲突

缓存写入方法`set`

源代码:

//macOS或者模拟器
template
void bucket_t::set(bucket_t *base, SEL newSel, IMP newImp, Class cls)
{
    ASSERT(_sel.load(memory_order_relaxed) == 0 ||
           _sel.load(memory_order_relaxed) == newSel);

// objc_msgSend uses sel and imp with no locks.
// It is safe for objc_msgSend to see new imp but NULL sel
// (It will get a cache miss but not dispatch to the wrong place.)
// It is unsafe for objc_msgSend to see old imp and new sel.
// Therefore we write new imp, wait a lot, then write new sel.
// 原有的imp进行编码 (和class进行异或运算)转化为 uintptr类型
uintptr_t newIMP = (impEncoding == Encoded
                    ? encodeImp(base, newImp, newSel, cls)
                    : (uintptr_t)newImp);

if (atomicity == Atomic) &#123;//修饰符号 atomic
    _imp.store(newIMP, memory_order_relaxed);
    
    if (_sel.load(memory_order_relaxed) != newSel) &#123;
#ifdef arm
 mega_barrier();
 _sel.store(newSel, memory_order_relaxed);
#elif x86_64 || i386
 _sel.store(newSel, memory_order_release);
#else
#error Don’t know how to do bucket_t::set on this architecture.
#endif
 }
 } else {
 _imp.store(newIMP, memory_order_relaxed);//写入_imp
 _sel.store(newSel, memory_order_relaxed);//写入_sel
 }
}
复制代码

结论:

set把sel和imp写入bucket，开始缓存方法

`insert`调用流程

xcode关闭汇编调试，探究调用一个实例方法是怎么调用了cache里面的insert方法？在insert方法中打个断点，然后运行源码

图:

结论:

堆栈信息显示调用insert方法流程：_objc_msgSend_uncached --> lookUpImpOrForward --> log_and_fill_cache --> cache_t::insert

堆栈信息只显示到_objc_msgSend_uncached，但是我们是调用了 [p say1] 也就是实例方法最后调用了cache_t::insert。现在我们知道了部分流程_objc_msgSend_uncached 到 cache_t::insert过程。[p say1] 到 _objc_msgSend_uncached 这个过程并不清楚。只能打开Xcode的汇编调试功能看汇编流程

汇编图: 结论:

[p say1]底层实现的是objc_msgSend方法，这个方法是消息发送方法将在下一节进行讲解
调用insert方法流程：[p say1]底层实现 objc_msgSend --> _objc_msgSend_uncached --> lookUpImpOrForward --> log_and_fill_cache --> cache_t::insert

`insert`调用流程图

graph LR
A[方法] -.-> B(objc_msgSend) -.-> C(_objc_msgSend_uncached) -.-> D(lookUpImpOrForward) -.-> E(log_and_fill_cache) -.-> cache_t::insert

补充:

小知识:

_sel _imp可查找上一节
哈希值方便增删,后面补充
数组是根据下标进行查找
链表有利于数组链接
哈希函数:>> % 下标 -> 数据
8%5 = 1 VS 8%6 = 1
容量的3/4 -> 负载因子 0.75 空间利用率 + 哈希冲突 -> 底层链表 + 红黑树 频率过多

`buckt`结构`llvm`调试

重点:_bucketsAndMaybeMask存储的是bucket首地址

llvm调试:

2021-07-03 21:25:55.822979+0800 KCObjcBuild[21684:4692396] LGPerson say : -[LGPerson say1]
KCObjcBuild was compiled with optimization - stepping may behave oddly; variables may not be available.
(lldb) p/x LGPerson.class
(Class) $0 = 0x0000000100008510 LGPerson
(lldb) p (cache_t *)0x0000000100008520
(cache_t *) $1 = 0x0000000100008520
(lldb) p *$1
(cache_t) $2 = {
  _bucketsAndMaybeMask = {
    std::__1::atomic = {
      Value = 4301421904
    }
  }
   = {
     = {
      _maybeMask = {
        std::__1::atomic = {
          Value = 3
        }
      }
      _flags = 32808
      _occupied = 1
    }
    _originalPreoptCache = {
      std::__1::atomic = {
        Value = 0x0001802800000003
      }
    }
  }
}
(lldb) p $2.buckets()
(bucket_t *) $3 = 0x0000000100627d50
(lldb) p *$3
(bucket_t) $4 = {
  _sel = {
    std::__1::atomic = (null) {
      Value = (null)
    }
  }
  _imp = {
    std::__1::atomic = {
      Value = 0
    }
  }
}
(lldb) p $3[1]
(bucket_t) $5 = {
  _sel = {
    std::__1::atomic = "" {
      Value = ""
    }
  }
  _imp = {
    std::__1::atomic = {
      Value = 48912
    }
  }
}
(lldb) p $5.sel()
(SEL) $6 = "say1"
(lldb) p $3+1
(bucket_t *) $7 = 0x0000000100627d60
(lldb) p $7->sel()
(SEL) $8 = "say1"
(lldb) p $2._bucketsAndMaybeMask
(explicit_atomic) $9 = {
  std::__1::atomic = {
    Value = 4301421904
  }
}
(lldb) p/x 4301421904
(long) $10 = 0x0000000100627d50
(lldb) 
复制代码

结论:

buckets = _bucketsAndMaybeMask $3
复制代码

`bucketMask`注意点

注意点 bucketMask要注意平台x86-64、ArmV64等大小端地址
大端地址从左到右
小端地址从右到左

举例子:

lldb) x p
0x100661c70: 11 85 00 00 01 80 1d 01 00 00 00 00 00 00 00 00  ................
0x100661c80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
(lldb) 
复制代码

结论:

取前8位作为地址
大端:0x1185000001801d01
小端:0x011d800100008511
复制代码

OC底层原理（六）：cache_t的分析

cache_t的本质

cache_t结构体

bucket_t结构体

cache_t 整体结构图

代码断点调试

脱离源码环境分析cache

cache_t源码探究

分析insert

计算当前所占容量大小

开辟容量

reallocate方法探究

allocateBuckets方法探究

setBucketsAndMask方法探究

collect_free方法探究

容量小于3/4

容量存满

容量超过3/4

缓存方法

incrementOccupied

cache_hash 和 cache_next

缓存写入方法set

insert调用流程

insert调用流程图

补充:

小知识:

buckt结构llvm调试

bucketMask注意点

`cache_t`的本质

`bucket_t`结构体

脱离源码环境分析`cache`

`cache_t`源码探究

分析`insert`

`allocateBuckets`方法探究

`setBucketsAndMask`方法探究

`collect_free`方法探究

`cache_hash` 和 `cache_next`

缓存写入方法`set`

`insert`调用流程

`insert`调用流程图

`buckt`结构`llvm`调试

`bucketMask`注意点