C++ internal

初步理解c++的类以及常用STL的内部结构

C++ class

在平时的c语言课程中,我们几乎只用到了struct的成员变量,后来我们才知道struct其实和class差不多,也可以有自己的成员函数构造函数析构函数,可以封装多态继承,唯一的区别就是struct的访问控制默认是public,而classprivate,现在我们来了解一下c++的class

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
#define _CRT_SECURE_NO_WARNINGS
#include <stdio.h>

class foo
{
public:
foo() {
printf("foo default ctor\n");
a = 114;
b = 514;
}
foo(int x, int y) : a(x), b(y)
{
printf("foo ctor\n\n");
}
~foo() { printf("foo dtor\n\n"); };
public:
void member() {
printf("foo member function\n");
printf("%d %d\n\n", a, b);
}
void fun() { printf("fun_foo\n\n"); }
private:
int a;
int b;
};

class bar : public foo
{
public:
bar() {
printf("bar ctor\n\n");
x = 1919;
y = 810;
}
~bar() { printf("bar dtor\n"); }
public:
void member() {
printf("bar member function\n");
printf("%d %d\n\n", x, y);
}
void fun() { printf("fun_bar\n\n"); }
private:
int x;
int y;
};

int main()
{
foo f0(114, 514);
bar b0;

foo* foo_ptr = &f0;
foo_ptr->member();

foo* bar_ptr = &b0;
bar_ptr->member();

foo_ptr->fun();
bar_ptr->fun();
return 0;
}

output

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
foo ctor

foo default ctor
bar ctor

foo member function
114 514

foo member function
114 514

fun_foo

fun_foo

bar dtor
foo dtor

foo dtor

member function(成员函数)

1
2
3
4
5
6
7
8
9
10
// foo:
virtual void member() {
printf("foo member function\n");
printf("%d %d\n", a, b);
}
// bar:
void member() {
printf("bar member function\n");
printf("%d %d\n", x, y);
}

顾名思义,就是struct或者class内部的成员,写在类内部的函数。根据访问控制tag不同,决定能否在外部调用,就像foo private下的变量是无法在外部函数通过foo.a来调用的

ctor & dtor(构造 & 析构)

1
2
3
4
5
6
7
8
9
10
11
12
// foo:
foo() {}
foo(int x, int y) : a(x), b(y)
{ printf("foo ctor\n"); }
~foo() { printf("foo dtor\n"); };
// bar:
bar() {
printf("bar ctor\n");
x = 1919;
y = 810;
}
~bar() { printf("bar dtor\n");

形如foo()foo(int x, int y)这种没有返回类型的与类同名的函数就是构造函数,通常在类对象生成时自动调用;

形如~bar()~foo()这种没有返回类型,与类同名并且在前面加上~的函数,就是析构函数,通常在类对象生命周期结束时自动调用;

这两类函数,如果没有编写,编译器会自动生成默认构造函数和析构函数

inheritance(继承)

1
class bar : public foo

bar继承自foo,是foo的派生类,这个public是访问修饰符access-specifier

当一个类派生自基类,该基类可以被继承为 public、protectedprivate 几种类型。继承类型是通过上面讲解的访问修饰符 access-specifier 来指定的。

我们几乎不使用 protectedprivate 继承,通常使用 public 继承。当使用不同类型的继承时,遵循以下几个规则:

  • 公有继承(public):当一个类派生自公有基类时,基类的公有成员也是派生类的公有成员,基类的保护成员也是派生类的保护成员,基类的私有成员不能直接被派生类访问,但是可以通过调用基类的公有保护成员来访问。
  • 保护继承(protected): 当一个类派生自保护基类时,基类的公有保护成员将成为派生类的保护成员。
  • 私有继承(private):当一个类派生自私有基类时,基类的公有保护成员将成为派生类的私有成员。

从输出结果可以看出,bar初始化的时候调用了foo的默认构造函数default ctor,然后调用了barctor,而且在最后变量销毁析构时,先调用了bardtor,然后调用了foodtor

这说明子类也会继承父类的私有变量,并且先构造父类的变量,然后构造子类的变量,最后先析构子类的变量,最后析构父类的变量,这刚好是相反的顺序。

Polymorphism(多态)

我们可以发现输出结果都输出了foo的成员函数, 这应该是错误的输出,因为编译器在调用函数前把这个成员函数设置为基类的版本。

但当我们在foo的成员函数前加上virtualtag,就能正常输出

1
2
3
4
foo member function
114 514
bar member function
1919 810

此时foo的成员函数声明为虚函数,编译器在调用函数时,根据调用的对象来动态确定这个函数的版本,这就是C++的多态

class internal

现在我们通过动态调试来看看class的内部情况

1

bar_ctor

通过ida可以发现

在构造对象时将vtable放在结构体的首地址

bar确实继承了foo类并且调用了foo default ctor

virtual function1

这两处call eax/edxfoobarmember函数调用

首先取类地址,存放到对应的指针变量处,然后取出vtable的值,最后取出vtable的第一个函数地址,调用该函数

vtable

虚函数表是存放在.rdata

virtual function2

这两处call eaxfoobarfun函数调用,仅仅只是在最后取函数地址时加了偏移

STL internal in windows

(本文仅讨论部分STL的内部结构和内存管理)

vector

memory management

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#define _CRT_SECURE_NO_WARNINGS
#include <stdio.h>
#include <vector>

void print(std::vector<int>& x)
{
unsigned int cnt = 1;
std::vector<int>::const_iterator cb = x.cbegin();
std::vector<int>::const_iterator ce = x.cend();

printf("size: %d, capacity: %d\n", x.size(), x.capacity());
for (; cb != ce; cb++) {
printf("%3d", *cb);
if (cnt++ % 10 == 0) {
printf("\n");
}
}
printf("\n");
}

int main()
{
std::vector<int> a;
print(a);

int tmp = a.capacity();
for (int i = 0; i < 20; i++)
{
a.push_back(i);
if (tmp != a.capacity()) {
print(a);
tmp = a.capacity();
}
}

return 0;
}

output

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
size: 0, capacity: 0

size: 1, capacity: 1
0
size: 2, capacity: 2
0 1
size: 3, capacity: 3
0 1 2
size: 4, capacity: 4
0 1 2 3
size: 5, capacity: 6
0 1 2 3 4
size: 7, capacity: 9
0 1 2 3 4 5 6
size: 10, capacity: 13
0 1 2 3 4 5 6 7 8 9

size: 14, capacity: 19
0 1 2 3 4 5 6 7 8 9
10 11 12 13
size: 20, capacity: 28
0 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19

可以发现在windows平台下的vector与linux的vector的内存策略不相同,在linux下,若vector的内存池满了,会直接申请两倍的内存池,并将原来的内容copy过去(若原来的内存池大小为0,会变成1)

push_back

1
2
3
4
5
6
7
8
_CONSTEXPR20 void push_back(const _Ty& _Val) { // insert element at end, provide strong guarantee
_Emplace_one_at_back(_Val);
}

_CONSTEXPR20 void push_back(_Ty&& _Val) {
// insert by moving into element at end, provide strong guarantee
_Emplace_one_at_back(_STD move(_Val));
}

push_back会根据_val的类型调用对应的**_Emplace_one_at_back**

1
2
3
4
5
6
7
8
9
10
11
12
template <class... _Valty>
_CONSTEXPR20 _Ty& _Emplace_one_at_back(_Valty&&... _Val) {
// insert by perfectly forwarding into element at end, provide strong guarantee
auto& _My_data = _Mypair._Myval2;
pointer& _Mylast = _My_data._Mylast;

if (_Mylast != _My_data._Myend) {
return _Emplace_back_with_unused_capacity(_STD forward<_Valty>(_Val)...);
}

return *_Emplace_reallocate(_Mylast, _STD forward<_Valty>(_Val)...);
}

当内存池满了,调用

_Emplace_reallocate(_Mylast, _STD forward<_Valty>(_Val)...)

1
2
3
4
// ...
const size_type _Newsize = _Oldsize + 1;
const size_type _Newcapacity = _Calculate_growth(_Newsize);
// ...

_Oldsize就是原来内存池的大小,然后_Newsize_Oldsize + 1后调用

_Calculate_growth(_Newsize)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
_CONSTEXPR20 size_type _Calculate_growth(const size_type _Newsize) const {
// given _Oldcapacity and _Newsize, calculate geometric growth
const size_type _Oldcapacity = capacity();
const auto _Max = max_size();

if (_Oldcapacity > _Max - _Oldcapacity / 2) {
return _Max; // geometric growth would overflow
}

const size_type _Geometric = _Oldcapacity + _Oldcapacity / 2;

if (_Geometric < _Newsize) {
return _Newsize; // geometric growth would be insufficient
}

return _Geometric; // geometric growth is sufficient
}

看到这里我们就清楚了vector的内存管理,新内存池大小就是max(_Oldsize + 1, _Oldsize + _Oldsize / 2),这确实与上文的输出结果相匹配

internal

1
2
3
4
using _Alty        = _Rebind_alloc_t<_Alloc, _Ty>;
using _Scary_val = _Vector_val<conditional_t<_Is_simple_alloc_v<_Alty>, _Simple_types<_Ty>,
_Vec_iter_types<_Ty, size_type, difference_type, pointer, const_pointer, _Ty&, const _Ty&>>>;
_Compressed_pair<_Alty, _Scary_val> _Mypair;

vector就一个成员变量_Mypair,是_Compressed_pair模板类

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
template <class _Ty1, class _Ty2, bool = is_empty_v<_Ty1> && !is_final_v<_Ty1>>
class _Compressed_pair final : private _Ty1 { // store a pair of values, deriving from empty first
public:
_Ty2 _Myval2;

using _Mybase = _Ty1; // for visualization

template <class... _Other2>
constexpr explicit _Compressed_pair(_Zero_then_variadic_args_t, _Other2&&... _Val2) noexcept(
conjunction_v<is_nothrow_default_constructible<_Ty1>, is_nothrow_constructible<_Ty2, _Other2...>>)
: _Ty1(), _Myval2(_STD forward<_Other2>(_Val2)...) {}

template <class _Other1, class... _Other2>
constexpr _Compressed_pair(_One_then_variadic_args_t, _Other1&& _Val1, _Other2&&... _Val2) noexcept(
conjunction_v<is_nothrow_constructible<_Ty1, _Other1>, is_nothrow_constructible<_Ty2, _Other2...>>)
: _Ty1(_STD forward<_Other1>(_Val1)), _Myval2(_STD forward<_Other2>(_Val2)...) {}

constexpr _Ty1& _Get_first() noexcept {
return *this;
}

constexpr const _Ty1& _Get_first() const noexcept {
return *this;
}
};

可以看到_Compressed_pair派生于第一个模板参数_Alty,其实也就是_Compressed_pair就是一个内存配置器,通过_Get_first()可以得到内存配置器,内含第二个模板参数对象_Scary_val,命名为_Myval2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
template <class _Val_types>
class _Vector_val : public _Container_base {
public:
using value_type = typename _Val_types::value_type;
using size_type = typename _Val_types::size_type;
using difference_type = typename _Val_types::difference_type;
using pointer = typename _Val_types::pointer;
using const_pointer = typename _Val_types::const_pointer;
using reference = value_type&;
using const_reference = const value_type&;

_CONSTEXPR20 _Vector_val() noexcept : _Myfirst(), _Mylast(), _Myend() {}

_CONSTEXPR20 _Vector_val(pointer _First, pointer _Last, pointer _End) noexcept
: _Myfirst(_First), _Mylast(_Last), _Myend(_End) {}

_CONSTEXPR20 void _Swap_val(_Vector_val& _Right) noexcept {
this->_Swap_proxy_and_iterators(_Right);
_Swap_adl(_Myfirst, _Right._Myfirst);
_Swap_adl(_Mylast, _Right._Mylast);
_Swap_adl(_Myend, _Right._Myend);
}

_CONSTEXPR20 void _Take_contents(_Vector_val& _Right) noexcept {
this->_Swap_proxy_and_iterators(_Right);
_Myfirst = _Right._Myfirst;
_Mylast = _Right._Mylast;
_Myend = _Right._Myend;

_Right._Myfirst = nullptr;
_Right._Mylast = nullptr;
_Right._Myend = nullptr;
}

pointer _Myfirst; // pointer to beginning of array
pointer _Mylast; // pointer to current end of sequence
pointer _Myend; // pointer to end of array
};

可以看到_Vector_val派生于_Container_base,有三个指针对象_Myfirst, _Mylast, _Myend,这与linux的STL如出一辙。_Container_base没啥重要的东西,直接略过,这三个指针才是vector内存维护的关键

string

在本文中,我只介绍std::string, 使用 char 类型的元素将 basic_string 类模板的专用化描述为 string 的类型

memory management

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#define _CRT_SECURE_NO_WARNINGS
#include <stdio.h>
#include <string>

void print(std::string& s) {
printf("size: %d capacity: %d\n", s.size(), s.capacity());
}

int main()
{
char str[] = "Hello enllus1on";
std::string s;
std::string s1("114 514");
std::string s2 = "1919810";
char error[] = "bad bad bad bad errorrrrr";
std::string s3(error);
printf("%s\n", s1.c_str());
printf("%s\n", s2.c_str());
printf("%s\n", s3.c_str());
print(s1);
print(s2);
print(s3);
unsigned int idx = 0;
unsigned int tmp = s.capacity();
print(s);

for (int i = 0; i < 500; i++) {
s += str[idx];
idx = (idx + 1) % 15;
if (tmp != s.capacity()) {
tmp = s.capacity();
print(s);
}
}
return 0;
}

output

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
114 514
1919810
bad bad bad bad errorrrrr
size: 7 capacity: 15
size: 7 capacity: 15
size: 25 capacity: 31
size: 0 capacity: 15
size: 16 capacity: 31
size: 32 capacity: 47
size: 48 capacity: 70
size: 71 capacity: 105
size: 106 capacity: 157
size: 158 capacity: 235
size: 236 capacity: 352
size: 353 capacity: 528

默认构造

1
2
3
4
5
_CONSTEXPR20
basic_string() noexcept(is_nothrow_default_constructible_v<_Alty>) : _Mypair(_Zero_then_variadic_args_t{}) {
_Mypair._Myval2._Alloc_proxy(_GET_PROXY_ALLOCATOR(_Alty, _Getal()));
_Tidy_init();
}

_Tidy_init()

1
2
3
4
5
6
7
8
9
10
_CONSTEXPR20 void _Tidy_init() noexcept {
// initialize basic_string data members
auto& _My_data = _Mypair._Myval2;
_My_data._Mysize = 0;
_My_data._Myres = _BUF_SIZE - 1;
_My_data._Activate_SSO_buffer();

// the _Traits::assign is last so the codegen doesn't think the char write can alias this
_Traits::assign(_My_data._Bx._Buf[0], _Elem());
}

根据上面的输出结果,并且多次运行,我们可以发现_BUF_SIZE是固定值为0x10capacity也就是_Myres的值

用字符串构造,会自动计算字符串长度

1
2
3
_CONSTEXPR20 basic_string(_In_z_ const _Elem* const _Ptr) : _Mypair(_Zero_then_variadic_args_t{}) {
_Construct<_Construct_strategy::_From_ptr>(_Ptr, _Convert_size<size_type>(_Traits::length(_Ptr)));
}

然后我们看下这个_Construct函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
    template <_Construct_strategy _Strat, class _Char_or_ptr>
_CONSTEXPR20 void _Construct(const _Char_or_ptr _Arg, _CRT_GUARDOVERFLOW const size_type _Count) {
auto& _My_data = _Mypair._Myval2;
_STL_INTERNAL_CHECK(!_My_data._Large_string_engaged());

if constexpr (_Strat == _Construct_strategy::_From_char) {
_STL_INTERNAL_STATIC_ASSERT(is_same_v<_Char_or_ptr, _Elem>);
} else {
_STL_INTERNAL_STATIC_ASSERT(_Is_elem_cptr<_Char_or_ptr>::value);
}

if (_Count > max_size()) {
_Xlen_string(); // result too long
}

auto& _Al = _Getal();
auto&& _Alproxy = _GET_PROXY_ALLOCATOR(_Alty, _Al);
_Container_proxy_ptr<_Alty> _Proxy(_Alproxy, _My_data);

if (_Count < _BUF_SIZE) {
_My_data._Mysize = _Count;
_My_data._Myres = _BUF_SIZE - 1;

if constexpr (_Strat == _Construct_strategy::_From_char) {
_Traits::assign(_My_data._Bx._Buf, _Count, _Arg);
_Traits::assign(_My_data._Bx._Buf[_Count], _Elem());
} else if constexpr (_Strat == _Construct_strategy::_From_ptr) {
_Traits::copy(_My_data._Bx._Buf, _Arg, _Count);
_Traits::assign(_My_data._Bx._Buf[_Count], _Elem());
} else { // _Strat == _Construct_strategy::_From_string
#ifdef _INSERT_STRING_ANNOTATION
_Traits::copy(_My_data._Bx._Buf, _Arg, _Count + 1);
#else // ^^^ _INSERT_STRING_ANNOTATION / !_INSERT_STRING_ANNOTATION vvv
_Traits::copy(_My_data._Bx._Buf, _Arg, _BUF_SIZE);
#endif // !_INSERT_STRING_ANNOTATION
}

_Proxy._Release();
return;
}

_My_data._Myres = _BUF_SIZE - 1;
const size_type _New_capacity = _Calculate_growth(_Count);
const pointer _New_ptr = _Al.allocate(_New_capacity + 1); // throws
_Construct_in_place(_My_data._Bx._Ptr, _New_ptr);

_Start_element_lifetimes(_Unfancy(_New_ptr), _New_capacity + 1);

_My_data._Mysize = _Count;
_My_data._Myres = _New_capacity;
if constexpr (_Strat == _Construct_strategy::_From_char) {
_Traits::assign(_Unfancy(_New_ptr), _Count, _Arg);
_Traits::assign(_Unfancy(_New_ptr)[_Count], _Elem());
} else if constexpr (_Strat == _Construct_strategy::_From_ptr) {
_Traits::copy(_Unfancy(_New_ptr), _Arg, _Count);
_Traits::assign(_Unfancy(_New_ptr)[_Count], _Elem());
} else { // _Strat == _Construct_strategy::_From_string
_Traits::copy(_Unfancy(_New_ptr), _Arg, _Count + 1);
}

_ASAN_STRING_CREATE(*this);
_Proxy._Release();
}

从源码中我们可以得出

  • 当字符串的长度_Count_BUF_SIZE时,会直接记录到_Mysize(_Count)_Myres(_BUF_SIZE - 1),也就是sizecapacity的值,然后copy后return
  • 否则,会先记录_Myres_BUF_SIZE - 1调用_Calculate_growth(_Count)计算新的capacity,然后记录到_Mysize(_Count)_Myres(_New_capacity),copy后return

_Calculate_growth(_Count)

1
2
3
_NODISCARD _CONSTEXPR20 size_type _Calculate_growth(const size_type _Requested) const noexcept {
return _Calculate_growth(_Requested, _Mypair._Myval2._Myres, max_size());
}
1
2
3
4
5
6
7
8
9
10
11
12
13
_NODISCARD static _CONSTEXPR20 size_type _Calculate_growth(
const size_type _Requested, const size_type _Old, const size_type _Max) noexcept {
const size_type _Masked = _Requested | _ALLOC_MASK;
if (_Masked > _Max) { // the mask overflows, settle for max_size()
return _Max;
}

if (_Old > _Max - _Old / 2) { // similarly, geometric overflows
return _Max;
}

return (_STD max)(_Masked, _Old + _Old / 2);
}

上文中的_BUF_SIZE_ALLOC_MASK实际上都来源于_String_val

1
2
3
4
5
6
7
static constexpr size_type _BUF_SIZE = 16 / sizeof(value_type) < 1 ? 1 : 16 / sizeof(value_type);
// roundup mask for allocated buffers, [0, 15]:
static constexpr size_type _ALLOC_MASK = sizeof(value_type) <= 1 ? 15
: sizeof(value_type) <= 2 ? 7
: sizeof(value_type) <= 4 ? 3
: sizeof(value_type) <= 8 ? 1
: 0;

由于这里的value_typechar,所以_BUF_SIZE是0x10,_ALLOC_MASK是0xF

看到这我们就清楚了,_Count_BUF_SIZE - 1时,_New_capacity的值为max(_Count | 0xF, _Old_capacity + _Old_capacity >> 1)

internal

1
2
3
4
5
using _Alty        = _Rebind_alloc_t<_Alloc, _Elem>;
using _Scary_val = _String_val<conditional_t<_Is_simple_alloc_v<_Alty>, _Simple_types<_Elem>,
_String_iter_types<_Elem, typename _Alty_traits::size_type, typename _Alty_traits::difference_type,
typename _Alty_traits::pointer, typename _Alty_traits::const_pointer, _Elem&, const _Elem&>>>;
_Compressed_pair<_Alty, _Scary_val> _Mypair;

可以看到string也就只有_Compressed_pair这一个类成员

_Compressed_pair

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
template <class _Ty1, class _Ty2, bool = is_empty_v<_Ty1> && !is_final_v<_Ty1>>
class _Compressed_pair final : private _Ty1 { // store a pair of values, deriving from empty first
public:
_Ty2 _Myval2;

using _Mybase = _Ty1; // for visualization

template <class... _Other2>
constexpr explicit _Compressed_pair(_Zero_then_variadic_args_t, _Other2&&... _Val2) noexcept(
conjunction_v<is_nothrow_default_constructible<_Ty1>, is_nothrow_constructible<_Ty2, _Other2...>>)
: _Ty1(), _Myval2(_STD forward<_Other2>(_Val2)...) {}

template <class _Other1, class... _Other2>
constexpr _Compressed_pair(_One_then_variadic_args_t, _Other1&& _Val1, _Other2&&... _Val2) noexcept(
conjunction_v<is_nothrow_constructible<_Ty1, _Other1>, is_nothrow_constructible<_Ty2, _Other2...>>)
: _Ty1(_STD forward<_Other1>(_Val1)), _Myval2(_STD forward<_Other2>(_Val2)...) {}

constexpr _Ty1& _Get_first() noexcept {
return *this;
}

constexpr const _Ty1& _Get_first() const noexcept {
return *this;
}
};

vector一样,_Compressed_pair派生于第一个模板参数_Alty,内含第二个模板参数对象_Scary_val等同于_String_val类,命名为_Myval2

_String_val

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
template <class _Val_types>
class _String_val : public _Container_base {
public:
using value_type = typename _Val_types::value_type;
using size_type = typename _Val_types::size_type;
using difference_type = typename _Val_types::difference_type;
using pointer = typename _Val_types::pointer;
using const_pointer = typename _Val_types::const_pointer;
using reference = value_type&;
using const_reference = const value_type&;

_CONSTEXPR20 _String_val() noexcept : _Bx() {}

// length of internal buffer, [1, 16]:
static constexpr size_type _BUF_SIZE = 16 / sizeof(value_type) < 1 ? 1 : 16 / sizeof(value_type);
// roundup mask for allocated buffers, [0, 15]:
static constexpr size_type _ALLOC_MASK = sizeof(value_type) <= 1 ? 15
: sizeof(value_type) <= 2 ? 7
: sizeof(value_type) <= 4 ? 3
: sizeof(value_type) <= 8 ? 1
: 0;

_CONSTEXPR20 value_type* _Myptr() noexcept {
value_type* _Result = _Bx._Buf;
if (_Large_string_engaged()) {
_Result = _Unfancy(_Bx._Ptr);
}

return _Result;
}

_CONSTEXPR20 const value_type* _Myptr() const noexcept {
const value_type* _Result = _Bx._Buf;
if (_Large_string_engaged()) {
_Result = _Unfancy(_Bx._Ptr);
}

return _Result;
}

_CONSTEXPR20 bool _Large_string_engaged() const noexcept {
return _BUF_SIZE <= _Myres;
}

constexpr void _Activate_SSO_buffer() noexcept {
// start the lifetime of the array elements
#if _HAS_CXX20
if (_STD is_constant_evaluated()) {
for (size_type _Idx = 0; _Idx < _BUF_SIZE; ++_Idx) {
_Bx._Buf[_Idx] = value_type();
}
}
#endif // _HAS_CXX20
}

_CONSTEXPR20 void _Check_offset(const size_type _Off) const {
// checks whether _Off is in the bounds of [0, size()]
if (_Mysize < _Off) {
_Xran();
}
}

_CONSTEXPR20 void _Check_offset_exclusive(const size_type _Off) const {
// checks whether _Off is in the bounds of [0, size())
if (_Mysize <= _Off) {
_Xran();
}
}

[[noreturn]] static void _Xran() {
_Xout_of_range("invalid string position");
}

_CONSTEXPR20 size_type _Clamp_suffix_size(const size_type _Off, const size_type _Size) const noexcept {
// trims _Size to the longest it can be assuming a string at/after _Off
return (_STD min)(_Size, _Mysize - _Off);
}

union _Bxty { // storage for small buffer or pointer to larger one
// This constructor previously initialized _Ptr. Don't rely on the new behavior without
// renaming `_String_val` (and fixing the visualizer).
_CONSTEXPR20 _Bxty() noexcept : _Buf() {} // user-provided, for fancy pointers
_CONSTEXPR20 ~_Bxty() noexcept {} // user-provided, for fancy pointers

value_type _Buf[_BUF_SIZE];
pointer _Ptr;
char _Alias[_BUF_SIZE]; // TRANSITION, ABI: _Alias is preserved for binary compatibility (especially /clr)
} _Bx;

size_type _Mysize = 0; // current length of string
size_type _Myres = 0; // current storage reserved for string
};

所以,实际上string就三个成员变量_Bx_Mysize_Myres,其中_Bxunion联合体_Bxtyunion的长度按union成员中最长的计算,其实也就是0x10的大小

  • _Myres<_BUF_SIZE_Bx通过copyassign直接保存字符串内容
  • _Myres>=_BUF_SIZE_Bx保存的是预留内存的指针,这个可以从上文的_Construct函数中得出
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// _Count >= _BUF_SIZE
_My_data._Myres = _BUF_SIZE - 1;
const size_type _New_capacity = _Calculate_growth(_Count);
const pointer _New_ptr = _Al.allocate(_New_capacity + 1); // throws
_Construct_in_place(_My_data._Bx._Ptr, _New_ptr);

_Start_element_lifetimes(_Unfancy(_New_ptr), _New_capacity + 1);

_My_data._Mysize = _Count;
_My_data._Myres = _New_capacity;
if constexpr (_Strat == _Construct_strategy::_From_char) {
_Traits::assign(_Unfancy(_New_ptr), _Count, _Arg);
_Traits::assign(_Unfancy(_New_ptr)[_Count], _Elem());
} else if constexpr (_Strat == _Construct_strategy::_From_ptr) {
_Traits::copy(_Unfancy(_New_ptr), _Arg, _Count);
_Traits::assign(_Unfancy(_New_ptr)[_Count], _Elem());
} else { // _Strat == _Construct_strategy::_From_string
_Traits::copy(_Unfancy(_New_ptr), _Arg, _Count + 1);
}

先计算_New_capacity也就是新的预留空间大小_Myres,然后开辟_New_capacity + 1大小的空间,并将指针保存到_Bx,并将字符串的内容copyassign到开辟的内存

string_1

deque

internal

dequevectorstring不一样,不提供capacity接口,而且deque的内部数据结构更为复杂,我们直接从ida开始看

step_into_c++_deque_1

一共有5个成员变量,我们再配合源码分析每个成员的用途

1
2
3
4
5
6
7
8
// class deque
using _Scary_val = _Deque_val<conditional_t<_Is_simple_alloc_v<_Alty>, _Deque_simple_types<_Ty>,
_Deque_iter_types<_Ty, typename _Alty_traits::size_type, typename _Alty_traits::difference_type,
typename _Alty_traits::pointer, typename _Alty_traits::const_pointer, _Ty&, const _Ty&, _Mapptr>>>;
static constexpr int _Minimum_map_size = 8;
static constexpr int _Block_size = _Scary_val::_Block_size;
// ...
_Compressed_pair<_Alty, _Scary_val> _Mypair;

deque同样也只有一个成员变量_Compressed_pair_Scary_val等同于_Deque_val

_Deque_val

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
template <class _Val_types>
class _Deque_val : public _Container_base12 {
public:
using value_type = typename _Val_types::value_type;
using size_type = typename _Val_types::size_type;
using difference_type = typename _Val_types::difference_type;
using pointer = typename _Val_types::pointer;
using const_pointer = typename _Val_types::const_pointer;
using reference = value_type&;
using const_reference = const value_type&;
using _Mapptr = typename _Val_types::_Mapptr;

private:
static constexpr size_t _Bytes = sizeof(value_type);

public:
static constexpr int _Block_size = _Bytes <= 1 ? 16
: _Bytes <= 2 ? 8
: _Bytes <= 4 ? 4
: _Bytes <= 8 ? 2
: 1; // elements per block (a power of 2)

_Deque_val() noexcept : _Map(), _Mapsize(0), _Myoff(0), _Mysize(0) {}

size_type _Getblock(size_type _Off) const noexcept {
// NB: _Mapsize and _Block_size are guaranteed to be powers of 2
return (_Off / _Block_size) & (_Mapsize - 1);
}

_Mapptr _Map; // pointer to array of pointers to blocks
size_type _Mapsize; // size of map array, zero or 2^N
size_type _Myoff; // offset of initial element
size_type _Mysize; // current length of sequence
};

_Deque_val的源码中,我们可以清楚的知道后四个成员变量的作用

1
2
3
4
_Mapptr _Map; // pointer to array of pointers to blocks
size_type _Mapsize; // size of map array, zero or 2^N
size_type _Myoff; // offset of initial element
size_type _Mysize; // current length of sequence

再跟进一下_Container_base12

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
struct _Container_base12;
struct _Container_proxy { // store head of iterator chain and back pointer
_CONSTEXPR20 _Container_proxy() noexcept = default;
_CONSTEXPR20 _Container_proxy(_Container_base12* _Mycont_) noexcept : _Mycont(_Mycont_) {}

const _Container_base12* _Mycont = nullptr;
mutable _Iterator_base12* _Myfirstiter = nullptr;
};

struct _Container_base12 {
public:
_CONSTEXPR20 _Container_base12() noexcept = default;

_Container_base12(const _Container_base12&) = delete;
_Container_base12& operator=(const _Container_base12&) = delete;

_CONSTEXPR20 void _Orphan_all() noexcept;
_CONSTEXPR20 void _Swap_proxy_and_iterators(_Container_base12&) noexcept;

template <class _Alloc>
_CONSTEXPR20 void _Alloc_proxy(_Alloc&& _Al) {
_Container_proxy* const _New_proxy = _Unfancy(_Al.allocate(1));
_Construct_in_place(*_New_proxy, this);
_Myproxy = _New_proxy;
_New_proxy->_Mycont = this;
}

template <class _Alloc>
_CONSTEXPR20 void _Reload_proxy(_Alloc&& _Old_alloc, _Alloc&& _New_alloc) {
// pre: no iterators refer to the existing proxy
_Container_proxy* const _New_proxy = _Unfancy(_New_alloc.allocate(1));
_Construct_in_place(*_New_proxy, this);
_New_proxy->_Mycont = this;
_Delete_plain_internal(_Old_alloc, _STD exchange(_Myproxy, _New_proxy));
}

_Container_proxy* _Myproxy = nullptr;

private:
_CONSTEXPR20 void _Orphan_all_unlocked_v3() noexcept;
_CONSTEXPR20 void _Swap_proxy_and_iterators_unlocked(_Container_base12&) noexcept;

void _Orphan_all_locked_v3() noexcept {
_Lockit _Lock(_LOCK_DEBUG);
_Orphan_all_unlocked_v3();
}

void _Swap_proxy_and_iterators_locked(_Container_base12& _Right) noexcept {
_Lockit _Lock(_LOCK_DEBUG);
_Swap_proxy_and_iterators_unlocked(_Right);
}
};

从这里我们可以_Myproxy_Container_base12类的成员变量,是_Container_proxy的指针类型,_Container_proxy有两个变量:_Mycont_Myfirstiter,一个是container的指针,一个是iterator的指针,至此,全部成员变量的作用就清晰明了了。

memory management

知道了deque成员变量的基本作用,我们可以开始通过常见的成员函数来了解deque的内存管理

push_back

1
2
3
4
void push_back(_Ty&& _Val) {
_Orphan_all();
_Emplace_back_internal(_STD move(_Val));
}

_Emplace_back_internal

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
   template <class... _Tys>
void _Emplace_back_internal(_Tys&&... _Vals) {
if ((_Myoff() + _Mysize()) % _Block_size == 0 && _Mapsize() <= (_Mysize() + _Block_size) / _Block_size) {
_Growmap(1);
}
_Myoff() &= _Mapsize() * _Block_size - 1;
size_type _Newoff = _Myoff() + _Mysize();
size_type _Block = _Getblock(_Newoff);
if (_Map()[_Block] == nullptr) {
_Map()[_Block] = _Getal().allocate(_Block_size);
}

_Alty_traits::construct(
_Getal(), _Unfancy(_Map()[_Block] + _Newoff % _Block_size), _STD forward<_Tys>(_Vals)...);

++_Mysize();
}
/// ...

/// _Deque_val
static constexpr int _Block_size = _Bytes <= 1 ? 16
: _Bytes <= 2 ? 8
: _Bytes <= 4 ? 4
: _Bytes <= 8 ? 2
: 1; // elements per block (a power of 2)

_Block_size_Deque_val中定义,总是2的阶乘,默认构造初始化,_Map_Mapsize_Myoff_Mysize都为0,所以会调用_Growmap(1)

_Growmap

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
void _Growmap(size_type _Count) { // grow map by at least _Count pointers, _Mapsize() a power of 2
static_assert(_Minimum_map_size > 1, "The _Xlen() test should always be performed.");

_Alpty _Almap(_Getal());
size_type _Newsize = _Mapsize() > 0 ? _Mapsize() : 1;
// class deque: static constexpr int _Minimum_map_size = 8;
while (_Newsize - _Mapsize() < _Count || _Newsize < _Minimum_map_size) {
// scale _Newsize to 2^N >= _Mapsize() + _Count
if (max_size() / _Block_size - _Newsize < _Newsize) {
_Xlen(); // result too long
}

_Newsize *= 2;
}
_Count = _Newsize - _Mapsize();

size_type _Myboff = _Myoff() / _Block_size;
_Mapptr _Newmap = _Almap.allocate(_Mapsize() + _Count);
_Mapptr _Myptr = _Newmap + _Myboff;

_Myptr = _STD uninitialized_copy(_Map() + _Myboff, _Map() + _Mapsize(), _Myptr); // copy initial to end
if (_Myboff <= _Count) { // increment greater than offset of initial block
_Myptr = _STD uninitialized_copy(_Map(), _Map() + _Myboff, _Myptr); // copy rest of old
_Uninitialized_value_construct_n_unchecked1(_Myptr, _Count - _Myboff); // clear suffix of new
_Uninitialized_value_construct_n_unchecked1(_Newmap, _Myboff); // clear prefix of new
} else { // increment not greater than offset of initial block
_STD uninitialized_copy(_Map(), _Map() + _Count, _Myptr); // copy more old
_Myptr = _STD uninitialized_copy(_Map() + _Count, _Map() + _Myboff, _Newmap); // copy rest of old
_Uninitialized_value_construct_n_unchecked1(_Myptr, _Count); // clear rest to initial block
}

if (_Map() != nullptr) {
_Destroy_range(_Map(), _Map() + _Mapsize());
_Almap.deallocate(_Map(), _Mapsize()); // free storage for old
}

_Map() = _Newmap; // point at new
_Mapsize() += _Count;
}

_Mapsize的值为0,_Newsize取1,否则,_Newsize_Mapsize的值。当_Newsize的值小于_Minimum_map_size也就是8的时候,并且_Count为1,所以_Newsize最后会取到8,_Count也会更新到8,然后开辟_Mapsize + _Count大小的_Newmap,然后copy原_Map上的内容到新_Newmap上,并通过_Uninitialized_value_construct_n_unchecked1根据value_type将其他新开辟的区域进行初始化,最后析构并释放原_Map,并更新_Map_Mapsize

_Uninitialized_value_construct_n_unchecked1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
template <class _NoThrowFwdIt, class _Diff>
_NoThrowFwdIt _Uninitialized_value_construct_n_unchecked1(_NoThrowFwdIt _UFirst, _Diff _Count) {
// value-initialize all elements in [_UFirst, _UFirst + _Count_raw)
_STL_INTERNAL_CHECK(_Count >= 0);
if constexpr (_Use_memset_value_construct_v<_NoThrowFwdIt>) {
return _Zero_range(_UFirst, _UFirst + _Count);
} else {
_Uninitialized_backout<_NoThrowFwdIt> _Backout{_UFirst};
for (; 0 < _Count; --_Count) {
_Backout._Emplace_back();
}

return _Backout._Release();
}
}

根据value_type进行零初始化或者默认构造,我们再回到_Emplace_back_internal

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// ...
_Myoff() &= _Mapsize() * _Block_size - 1;
size_type _Newoff = _Myoff() + _Mysize();
size_type _Block = _Getblock(_Newoff);
if (_Map()[_Block] == nullptr) {
_Map()[_Block] = _Getal().allocate(_Block_size);
}

_Alty_traits::construct(
_Getal(), _Unfancy(_Map()[_Block] + _Newoff % _Block_size), _STD forward<_Tys>(_Vals)...);

++_Mysize();
}

// _Getblock()
size_type _Getblock(size_type _Off) const noexcept {
// NB: _Mapsize and _Block_size are guaranteed to be powers of 2
return (_Off / _Block_size) & (_Mapsize - 1);
}

_Myoff获得当前所在的元素的位置,_Newoff获得下一个元素的位置,通过_Getblock(_Newoff)获得下一个元素_Block的位置,若_Map[_Block]此处未申请_Block,申请个_Block,并构造一个value,最后++_Mysize

Push_front

1
2
3
void push_front(const _Ty& _Val) {
emplace_front(_Val);
}

emplace_front

1
2
3
4
5
template <class... _Valty>
decltype(auto) emplace_front(_Valty&&... _Val) {
_Orphan_all();
_Emplace_front_internal(_STD forward<_Valty>(_Val)...);
}

_Emplace_front_internal

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
template <class... _Tys>
void _Emplace_front_internal(_Tys&&... _Vals) {
if (_Myoff() % _Block_size == 0 && _Mapsize() <= (_Mysize() + _Block_size) / _Block_size) {
_Growmap(1);
}
_Myoff() &= _Mapsize() * _Block_size - 1;
size_type _Newoff = _Myoff() != 0 ? _Myoff() : _Mapsize() * _Block_size;
const size_type _Block = _Getblock(--_Newoff);
if (_Map()[_Block] == nullptr) {
_Map()[_Block] = _Getal().allocate(_Block_size);
}

_Alty_traits::construct(
_Getal(), _Unfancy(_Map()[_Block] + _Newoff % _Block_size), _STD forward<_Tys>(_Vals)...);

_Myoff() = _Newoff;
++_Mysize();
}

_Emplace_front_internal实际上和_Emplace_back_internal差不多,但是当_Myoff值为0时,_Newoff就会赋值为最后一个block的最后一个位置。再push_front就相当于从后往前放置元素