C++ strategies for writing knowledge points C++ memory recovery

Author: Inventors quantify - small dreams, Created: 2017-12-29 11:00:02, Updated: 2017-12-29 11:25:11

C++ strategies for writing knowledge points C++ memory recovery

There are some basic things you need to know before writing C++ strategies, but at least you should know these rules. The following is a transcript:

  • #### C++ Memory Object Battle What do you mean? If a person who claims to be a good programmer knows nothing about memory, then I can tell you that he is probably bragging. Writing a program in C or C++ requires more attention to memory, not only because whether or not the allocation of memory is reasonable directly affects the efficiency and performance of the program, but more importantly, when we operate memory, there are problems that come up carelessly, and many times, these problems are not easily detected, such as memory leaks, such as hanging pointers.

We know that C++ divides memory into three logical regions: heap, heap, and static storage. Since this is the case, I call the objects located in them heap objects, heap objects, and static objects. So what is the difference between these different memory objects?

  • 1 Basic concepts

    Let's first look at ──, which is generally used to store local variables or objects, such as the ones we use in function definitions to declare with similar statements:

    Type stack_object ; 
    

    stack_object is a stack object whose life begins at the definition point and ends when its function returns.

    Also, almost all temporary objects are nested objects. For example, the following function definition:

    Type fun(Type object);
    

    This function produces at least two temporary objects, first, the parameters are passed by value, so the copy constructor function is called to generate a temporary object object_copy1, which is used inside the function not by object_copy1, but by object_copy1, naturally, object_copy1 is a nested object, which is released when the function returns; and this function is a return value, and when the function returns, it also produces a temporary object object_copy2, which is released some time after the function returns. For example, a function has the following code:

    Type tt ,result ; //生成两个栈对象
    
    tt = fun(tt); //函数返回时,生成的是一个临时对象object_copy2
    

    The implementation of the second statement above is to first generate a temporary object object_copy2 when the function fun returns, and then call the assignment operator to execute.

    tt = object_copy2 ; //调用赋值运算符
    

    See? The compiler generates so many temporary objects for us without our knowledge, and the time and space cost of generating these temporary objects can be great, so you can probably see why it's better to pass function parameters by value instead of by const reference for const const objects.

    Next, look at the stack. The stack, also known as the free storage area, is dynamically allocated during program execution, so its biggest feature is its dynamics. In C++, all stack objects are created and destroyed by the programmer, so if they are not handled properly, memory problems occur. If the stack object is allocated, but forgot to release, memory leaks occur.

    So how do you allocate stack objects in C++? The only way to do this is to use new (of course, you can also get C-style stack memory with the malloc command), which allocates a memory block in the stack and returns a pointer to the object.

    Let's look at the static storage area again. All static objects and global objects are allocated to the static storage area. As for global objects, they are allocated before the main))) function is executed. In fact, before the display code in the main))) function is executed, a main))) function generated by the compiler is called, while the main))) function does the construction and initialization of all global objects.

    void main(void)
    {
        ... // 显式代码
    }
    
    
    // 实际上转化为这样:
    
    
    void main(void)
    {
        _main(); //隐式代码,由编译器产生,用以构造所有全局对象
        ...      // 显式代码
        ...
        exit() ; // 隐式代码,由编译器产生,用以释放所有全局对象
    }
    

    So, knowing this, we can derive some tricks from this, such as, suppose we are going to do some preparation work before executing the main() function, then we can write these preparations into the constructor of a custom global object, so that before executing the explicit code of the main() function, the constructor of this global object is called and performs the expected action, thus achieving our goal. If we just talked about the global object in the static storage area, then what about the local static object?

    There is also a static object, which is a static member of a class. Considering this situation raises some more complex questions.

    The first problem is the lifetime of the static class object, which is created with the first class object and expires at the end of the entire program. That is, there is a situation where we define a class in the program and have a static object as a member of the class, but during the execution of the program, if we do not create any of the class objects, then we will not produce the static object that the class contains.

    The second problem is when the following situations occur:

    class Base
    {
    public:
        static Type s_object ;
    }
    
    
    class Derived1 : public Base / / 公共继承
    {
    
    
        ... // other data 
    
    
    }
    
    
    class Derived2 : public Base / / 公共继承
    {
    
    
        ... // other data 
    
    
    }
    
    
    Base example ;
    
    
    Derivde1 example1 ;
    
    
    Derivde2 example2 ;
    
    
    example.s_object = …… ;
    
    
    example1.s_object = …… ; 
    
    
    example2.s_object = …… ; 
    

    Note that the above three statements, labeled as black body, are the same object that they are accessing? The answer is yes, they are indeed pointing to the same object, which doesn't sound like it's true, does it? But it's true, and you can write a simple piece of code to verify it yourself. What I'm going to do is explain why.

    Let's imagine that when we pass an object of type Derived1 to a function that accepts a non-reference Base type parameter, a cut occurs, so how do we do the cut? Believe me, by now you already know, that is to simply remove the subject object from the object of type Derived1, ignoring all the other data members of the Derived1 custom, and then pass that subject object to the function ((actually, the function uses a copy of this subject object)).

    All objects of a derived class that inherits the Base class contain a subobject of the type Base (which is a key that can be pointed to a Derived1 object with a Base type pointer, which is also a polymorphic key), while all subobjects and all Base type objects share the same s_object object. Naturally, instances of classes in the entire inheritance system derived from the Base class will share the same s_object object. The object layout of the above mentioned example, example 1, example 2 is shown below:

  • 2 Comparison of three memory objects

    The advantage of the hash object is that it is automatically generated at the appropriate time and destroyed at the appropriate time, without the need for programmers to worry; and the creation of hash objects is generally faster than that of stack objects, because when allocating stack objects, the operator new operation is called, and the operator new uses some kind of memory search algorithm, which can be very time-consuming, and generating hash objects is not so much trouble, it only needs to move the vertices of the top of the object. However, it should be noted that the capacity of the hash space is usually relatively small, generally 1 MB to 2 MB, so the larger volume is not suitable for the allocation of hashes.

    Heap objects, whose creation and destruction are both programmer-defined, i.e. the programmer has complete control over the life of the heap object. We often need such objects, for example, we need to create an object that can be accessed by multiple functions, but do not want to make it global, then creating a heap object at this time is undoubtedly a good option, and then passing the pointer of this heap object between the functions, so that the sharing of the object can be achieved.

    Let's look at static objects next.

    The first is global objects. Global objects provide the simplest way to communicate between classes and between functions, although this is not elegant. Generally speaking, in a fully object-oriented language, global objects do not exist, such as C#, because global objects mean insecurity and high cohesion.

    Then there is the static member of the class, which, as mentioned above, is shared by all objects of the base class and its derived classes, so it is a good choice when data sharing or communication is needed between these classes or between these class objects.

    Then there is the static local object, which is mainly used to store the intermediate state during which the function in which the object is located is called iteratively, one of the most prominent examples of which is the recursive function, we all know that recursive functions are those that call their own functions, if a nonstatic local object is defined in a recursive function, then when the number of recursions is quite large, the resulting overhead is also huge. This is because nonstatic local objects are heap objects, each recursive call produces such an object, each return release releases this object, and, such an object is only limited to the current call layer, which is invisible to the deeper nested layers and shallower layers.

    In recursive function design, static objects can be used to replace non-static local objects (i.e. nested objects), which not only reduces the expense of generating and releasing non-static objects on each recursive call and return, but static objects can also preserve the intermediate state of recursive calls and be accessible to all call layers.

  • 3 Accidental harvesting with a rubber object

    It has been introduced earlier that the object is created at the appropriate time and then automatically released at the appropriate time, i.e. the object has an automatic management function. So where is the object released automatically? First, at the end of its life; second, when there are abnormalities in its function.

    The stack object, when automatically released, will call its own unwinding function. If we wrap the resources in the stack object, and perform the action of unwinding the resources in the unwinding function of the stack object, then the probability of a resource leak is greatly reduced, because the stack object can automatically release the resources even when the function in which it is released is abnormal. The actual process is as follows: when the function is thrown abnormally, so-called stack_unwinding (stack rollback) occurs, i.e. it unfolds in the stack, and since it is a stack object, naturally present in the stack, the unwinding function of the stack object will be executed during the unwinding process, thus releasing the small amount of resources that are wrapped.

  • 4 Prohibit the creation of stack objects

    As mentioned above, if you decide to prohibit the creation of a certain type of heap object, you can create a resource envelope class of your own, which can only be generated in the heap, so that the envelope resource is automatically released in exceptional circumstances.

    So how do we prevent the creation of stack objects? We already know that the only way to prevent the creation of stack objects is to use the new operation, if we prevent the use of new does not work. Furthermore, the new operation calls the operator new when executed, while the operator new can be reloaded. There is a way to make the new operator private, and for symmetry, it is best to also reload the operator private.

    #include <stdlib.h> //需要用到C式内存分配函数
    
    
    class Resource ; //代表需要被封装的资源类
    
    
    class NoHashObject
    {
    private: 
        Resource* ptr ;//指向被封装的资源
    
    
        ... ... //其它数据成员
    
    
        void* operator new(size_t size) //非严格实现,仅作示意之用
        { 
            return malloc(size) ; 
        } 
    
    
        void operator delete(void* pp) //非严格实现,仅作示意之用
        { 
            free(pp) ; 
        } 
    
    
    public: 
        NoHashObject() 
        { 
            //此处可以获得需要封装的资源,并让ptr指针指向该资源
    
    
            ptr = new Resource() ; 
        } 
    
    
        ~NoHashObject() 
        { 
    
    
            delete ptr ; //释放封装的资源
        } 
    }; 
    

    NoHashObject is now a class that prohibits heap objects if you write the following code:

    NoHashObject* fp = new NoHashObject ((() ; // error during compilation!

    delete fp ;

    The above code will generate compile-time errors. Okay, now that you know how to design a class that prohibits heap objects, you may have the same question as I do, is it possible to generate a heap object of this type without changing the definition of the class NoHashObject? No, there is a way, I call it a crack-violence algorithm. C++ is so powerful that you can do anything you want with it.

    void main(void)
    {
        char* temp = new char[sizeof(NoHashObject)] ; 
    
    
        //强制类型转换,现在ptr是一个指向NoHashObject对象的指针
    
    
        NoHashObject* obj_ptr = (NoHashObject*)temp ; 
    
    
        temp = NULL ; //防止通过temp指针修改NoHashObject对象
    
    
        //再一次强制类型转换,让rp指针指向堆中NoHashObject对象的ptr成员
    
    
        Resource* rp = (Resource*)obj_ptr ; 
    
    
        //初始化obj_ptr指向的NoHashObject对象的ptr成员
    
    
        rp = new Resource() ; 
    
    
        //现在可以通过使用obj_ptr指针使用堆中的NoHashObject对象成员了
    
    
        ... ... 
    
    
        delete rp ;//释放资源
    
    
        temp = (char*)obj_ptr ; 
    
    
        obj_ptr = NULL ;//防止悬挂指针产生
    
    
        delete [] temp ;//释放NoHashObject对象所占的堆空间。
    
    
        } 
    

    The above implementation is troublesome, and this implementation is rarely used in practice, but I wrote it anyway, because understanding it is beneficial for our understanding of C++ memory objects. What is the most fundamental thing about all the forced type conversions above?

    The data in a piece of memory is constant, and the type is the glasses we wear, and when we put on a pair of glasses, we use the corresponding type to interpret the data in memory, so different interpretations get different information.

    Forced type conversion is essentially changing a pair of glasses to look at the same piece of memory data again.

    It is also worth noting that different compilers may have different layouts for the member data of the object, for example, most compilers arrange the ptr pointer member of NoHashObject in the first 4 bytes of the object space to ensure that the conversion action of the following statement is executed as we expect:

    Resource* rp = (Resource*)obj_ptr ;

    However, this is not necessarily the case for all compilers.

    Since we can forbid the production of a certain type of heap object, can we design a class so that it does not produce a heap object?

  • 5 Prohibited from producing silicon objects

    As mentioned earlier, when creating a cube object, the cube pointer is moved to move the cube away from the space of the appropriate size, and then the corresponding constructor is called directly in this space to form a cube object, and when the function returns, the cube is called to release the object, and then the cube pointer is adjusted to retrieve the cube. This process does not require the new/delete operator, so setting the new/delete operator to private is not possible. Of course, from the above narrative, you may have already thought:

    That's fine, and I'm going to do that. But before that, one thing to consider is that if we set the constructor to private, we can't use new to directly generate a stack object, because new will also call its constructor after allocating space to the object. So, I'm going to set only the dialling function to private.

    If a class is not intended to be a base class, the usual solution is to declare its decomposition function as private.

    In order to restrict the array object without restricting inheritance, we can declare the parse function as protected, which is both.

    class NoStackObject
    {
    protected: 
    
    
        ~NoStackObject() { } 
    
    
    public: 
    
    
        void destroy() 
    
    
        { 
    
    
            delete this ;//调用保护析构函数
    
    
        } 
    }; 
    

    Next, you can use a NoStackObject class like this:

    NoStackObject* hash_ptr = new NoStackObject() ;

    ...... // performs operations on objects that are hash_ptr

    hash_ptr->destroy (); What do you mean? Well, isn't it a little strange that we create an object with new, but instead of deleting it with delete, we use the destroy method. Apparently, users aren't used to this weird way of using it. So, I decided to set the constructor function to either private or protected. Which brings us back to the question I tried to avoid above, which is, how do you generate an object without new?

    class NoStackObject
    {
    protected: 
    
    
        NoStackObject() { } 
    
    
        ~NoStackObject() { } 
    
    
    public: 
    
    
        static NoStackObject* creatInstance() 
        { 
            return new NoStackObject() ;//调用保护的构造函数
        } 
    
    
        void destroy() 
        { 
    
    
            delete this ;//调用保护的析构函数
    
    
        } 
    };
    

    Now you can use NoStackObject classes like this:

    NoStackObject* hash_ptr = NoStackObject::creatInstance() ;

    ...... // performs operations on objects that are hash_ptr

    hash_ptr->destroy() ;

    hash_ptr = NULL; // prevents the use of a hanging pointer

    It feels better now that the operation of generating and releasing objects is consistent.

  • The C++ method of recycling trash

Many C or C++ programmers are skeptical of garbage recycling, believing that garbage recycling is definitely less efficient than managing dynamic memory themselves, and that it will stop the program there at the time of recycling, whereas if they control memory management, the allocation and release times are stable and do not cause the program to stop. Finally, many C/C++ programmers are convinced that garbage recycling mechanisms cannot be implemented in C/C++. These misconceptions are based on a lack of understanding of the garbage recycling algorithm.

In fact, the garbage recycling mechanism is not slow, and even more efficient than the dynamic memory allocation. Since we can only allocate without releasing, then the allocation of memory requires only constantly obtaining new memory from the stack, the pointers of the moving stack are enough; and the release process is omitted, and naturally accelerated. Modern garbage recycling algorithms have developed a lot, and incremental collection algorithms have allowed the garbage recycling process to be carried out in phases, avoiding the interruption of the program.

The basis of garbage recycling algorithms is usually based on scanning and marking all the memory blocks that may be currently in use, and recovering unmarked memory from all the memory that has already been allocated. In C/C++, the notion that garbage recycling cannot be achieved is usually based on the inability to correctly scan all the memory blocks that may still be in use, but what seems impossible is actually not complicated. First, by scanning memory data, the dynamically allocated memory pointers to the stack are easily identified, and if there is an identification error, only some non-pointer data can be pointed to as pointers, not pointers as non-pointer data.

When recycling, just scan the bss segment, data segment and the currently used cache to find the amount of possible dynamic memory pointers, and the reference memory is scanned back to get all the dynamic memory currently in use.

It is possible to improve the speed of memory management and even reduce the total memory consumption if you are willing to implement a good garbage collector for your project. If you are interested, you can search online for existing papers and libraries on garbage recycling that have been implemented, which is especially important for a programmer.

Translated fromHK Zhang

  • #### Why is it that when a local variable is addressed to a pointer, its lifecycle can be extended to the end of the entire program?
  #include<stdio.h>
  int*fun(){
      int k = 12;
      return &k;
  }
  int main(){
      int *p = fun();    
      printf("%d\n", *p);
 
      getchar();
      return 0;
  }

It's not only accessible, it's modifiable, but it's uncertain. The addresses of the local variables are all in the program's own stack, and after the end of the authority variable, the value of the local variable will remain as long as the memory address of the local variable is not given to another variable. But if modified, it is more dangerous, because this memory address may have been given to other variables of the program, which can cause the crash of the program if forced to change by the pointer.

csdn bbs


More