Life of a Pixel Chromium browser kernel rendering principle study notes

In the Chromium browser kernel, how the content from the front end is finally converted into pixels on the screen, that is, the browser kernel rendering process, which is a series of steps in a complex project. For these complex steps, we need to grasp the aspects of design ideas, data models, and data model interactions at each stage of the process. An in-depth and careful understanding of the above content is beneficial for us to always stay awake and pinpoint the position when we read the huge source code. Below I will discuss this process in detail based on my understanding of Life of a Pixel.

Life of a Pixel input and output

First of all, we need to talk about input and output. The input of this series of steps becomes Web Content. It consists mainly of a set of texts that describe Web content, as defined by a series of existing protocols, and of course other references. Existing protocols (what we often call programming languages) are usually HTML, CSS, and JavaScript. These three define the structural content, style, and logic of Web content, respectively. But this is not a strict division of labor. JavaScript is taking on more and more responsibilities in the current popular front-end design idea (separation of front-end and back-end). JavaScript itself is constantly becoming independent, and in recent years, it has gradually jumped out of the browser platform to play its role independently (node.js, react-vr).

On the other hand, talk about output. How to draw pixels on the screen involves the theory and engineering of computer graphics. Traditionally, we can think of a computer going through the following steps to output content to the screen. The application software converts the graphics content it wants to express into calls to the operating system and graphics-related function libraries (OpenGL, Direct X, etc.), these function libraries install drivers in the operating system and other operating system-related services to the hardware ( GPU, etc.) transmit data and instructions, and manipulate the computing core and memory (GPU, etc.) of the hardware to complete steps such as rasterization, and finally convert the final content in the hardware (GPU, etc.) memory into a signal output to the screen. In this regard, the API provided by the function library (OpenGL) is relatively low-level.

For example, I have also done OpenGL-related programming. Although modern OpenGL provides some models and protocols (such as pipelines, etc.) to simplify the work, the models provided and the API designed according to the models have obvious hardware flavor. For OpenGL, the caller needs to accurately input the preset or the data calculated by the GPU program, the position, color and drawing order of the drawn content in the geometric coordinate system.

It is roughly inferred from the input and output that the browser's job is to accurately understand the front-end developer's description of the Web page according to the advanced front-end protocol (front-end programming language) and its affiliated multimedia, data, etc. The transformation calculation infers all kinds of information required by the graphics library. The complexity and compatibility and robustness requirements of current front-end protocols make the implementation of this step very complex and huge. The requirements for performance and stability further increase the difficulty of design and implementation. Achieving this series is not only a technical, but also a software engineering problem.

Overview of the page life cycle

From the above discussion of input and output, we can understand that the Chromium team proposed the following about the page life cycle.

  1. The relevant data model is generated through several steps based on Web Content.
  2. The data model is constantly updated with time, interaction and other factors.

Among them, for the second point, it is required to quickly modify the data model generated by the first step as little as possible, and to reduce the computational cost as much as possible. The reason for this is that the related computations performed in the first step to generate the data model and the interaction of the data model are still very expensive for the existing average performance level, so it is not advisable to repeatedly perform the first step in a practical environment.

Introduction to preliminary rendering steps

For the related steps between input and output mentioned above, we will discuss in the following order.

DOM

For HTML, its grammar rules have obvious tree-like characteristics. This makes it easy to describe the structure of the document with HTML along with part of the content. So we need to extract the structure information and content information. In this step, the HTML document parser will parse the text in the HTML document and convert it into a DOM tree.

Regarding the DOM tree, I have to mention the ability of JavaScript to manipulate the DOM tree. I think this is also one of the core contents of JavaScript. With this ability, JavaScript can control, update, and add, change, and delete pages. I think this is the technical basis for the separation of front and back ends. The actual operation of the DOM tree by the above capabilities is specifically completed by the V8 engine, which implements the JavaScript API for manipulating the DOM tree, so that JavaScript has this capability.

CSS (style)

CSS has two main functions, filtering or specifying the HTML tags it functions, and defining the content of the corresponding HTML tags. For us, the essence is to filter or find out DOM nodes, and map style information to DOM nodes.

There is a problem that needs attention. In the CSS file's description of the style of one or several DOM nodes, there may be undefined, duplicated, conflicting, and invalid style definitions. In response to this problem, the Chromium team introduced recalculation (recalc) to calculate all the corresponding Computed Styles for each node of the DOM tree.

Layout

With the information provided by the two steps above, we need further transformations. Convert DOM nodes to visual geometry (Layout) along with Compute Style. In this step, the problems we need to solve include the final position, arrangement and size of elements such as text, tables, layouts, etc. on the page. In order to be able to efficiently calculate and organize this information, Chromium constructed a Layout Tree. This data structure is intended to hold information about the structure and to do the work described above.

In the code, the following structure describes the information.

// A LayoutRect contains the information needed to generate a CGRect that may or // may not be flipped if positioned in RTL or LTR contexts. |boundingWidth| is // the width of the bounding coordinate space in which the resulting rect will // be used. |position| is used to describe the location of the resulting frame, // and |size| is the size of resulting frame. struct LayoutRect { CGFloat boundingWidth; LayoutRectPosition position; CGSize size; };

The node relationship between the Layout tree and the DOM tree is not one-to-one. In some cases, the DOM node does not need to have its corresponding Layout object or it can be placed in other related Layout objects (usually the Layout object corresponding to the parent node). One of the things that is changing is the legacy layout object and LayoutNG. LayoutNG is proposed to solve the problem that the input, output and other intermediate content in the current Layout object are mixed and the parent and child nodes refer to each other during the calculation process. The original design first brought about the problem of determining the validity of node data. The node being calculated needs to judge that the data it refers to is the final state it needs, otherwise the data currently calculated by the node may still need to be recalculated later. This increases the complexity of algorithm design. Correspondingly, the input and output of LayoutNG are separated, and once the output is generated, its state has been determined and cannot be modified, and the structure is clear, so the algorithm design is relatively simple and efficient.

Paint

According to the information provided by the Layout tree, we can calculate some more basic information such as coordinate system position, size, color, drawing order, etc. In the Paint step, we convert the Layout tree into a list of actions to paint. In the update process, the previous operation will split the Layout tree into independent layers, and in this step, an independent drawing operation list will be generated for each layer. During this process, we need to convert the structural information into the drawing stack order. Each node inside the stack is a Paint Phrase with a relatively independent drawing process. Specifically combined with the code, the Paint step will generate a set of DrawXXXOp data structures.

Raster

Rasterization converts drawing information into bitmaps in memory (usually existing). The DrawXXXOp data structure generated by the above steps is converted to ImageDecodeCache through the raster step. The raster step is not performed in the rendering process, but in the GPU process (design concept). The reason for this is that this step needs to generate a GL call (a proxy function that is dynamically bound to an OpenGL API address at runtime), and the rendering process is prohibited from making GL calls directly due to the browser's sandbox policy. This can effectively prevent malicious code from exploiting actual OpenGL API vulnerabilities, and prevent instability caused by some defects in GL or drivers from crashing the rendering process, thereby reducing the overall stability of the browser. During this process, the DrawXXXOp data structure will be passed by the rendering process to the GPU process. The GPU process will make a call to DrawRectOpResult() for this data structure.

The DrawRectOpResult structure operates on the interface provided by SKIA to generate GL calls. Compared with the functions provided by the GL function group, SKIA can provide some more advanced computer graphics calculations. SKIA is also used in other Google projects, such as Android.

There are still some important points about GPU processes that need to be mentioned here. The GPU process can be restarted by the browser after a crash, which is not perceived by the user, and a GPU process can serve multiple Web Content rendering processes, including UI rendering processes. On the Windows platform, GL calls are eventually translated into Direct X calls through ANGLE. The reason is that the actual effect of OpenGL support under Windows (personally inferred as factors such as performance, support, etc.) is not ideal, and although Direct X is slightly inferior to OpenGL in accuracy (most professional industrial design software uses OpenGL calls) , but it is stronger than OpenGL in terms of graphics card driver compatibility and performance (some OpenGL APIs are poorly supported or even not supported on ordinary graphics card drivers, which is one of the reasons for expensive professional graphics card hardware and its driver products). It is worth mentioning that Direct X is initiated and maintained by Microsoft itself.

Conclusion

The results of the preliminary rendering are still stored in the memory (memory, video memory), which also represents the relevant data structures and the data they require and are calculated or generated. These data structures and the data in them will be used in the following update process as data input and support.

Overview of update steps

Because of the large scope involved, the two operations of Paint and Raster are very expensive. For browser applications, it is necessary to ensure 60 FPS or even higher. If it is lower than this line, users may experience lag and affect the user experience. Therefore, the design idea of the update step is mainly to perform the Paint and Raster steps on as few data structures as possible. Under this design idea, I think the main direction of efforts is divided into the following two points: refine the granularity of operations, and set priorities for operations to ensure an experience that users can see. Another aspect is that, due to the single-threaded nature of JavaScript, how to process some responses that can be processed using existing information as quickly as possible when JavaScript performs expensive operations that block the main thread, this is a difficult problem. The mechanism and design ideas designed in the update step are difficult to understand. In the following, I will discuss this series of steps based on my understanding.

compositing

Chromium introduces and compositing this solution for the above design ideas and difficulties. Under this solution, the page will be broken up into layers that are rendered independently. These layers are processed and drawn by a thread called the compositing thread. On the main thread, the Layer tree is converted to cc::Layer via Paint Layer, and cc::Layer is the independent layer mentioned above. cc::Layer is also the basic unit of synthesizer operations. Paint Layer can be understood as a candidate cc::Layer, which selects, merges and converts through context and corresponding mechanisms. cc::Layer is a list-type data structure constructed by Graphics Layer through UpdateAssignmentIfNeeded. The above steps are all located in the compositing assignment step, but will be moved to the Paint in the future for unknown reasons.

// Base class for composited layers. Special layer types are derived from // this class. Each layer is an independent unit in the compositor, be that // for transforming or for content. If a layer has content it can be // transformed efficiently without requiring the content to be recreated. // Layers form a tree, with each layer having 0 or more children, and a single // parent (or none at the root). Layers within the tree, other than the root // layer, are kept alive by that tree relationship, with refpointer ownership // from parents to children. class CC_EXPORT Layer : public base::RefCounted { ... };

Prepaint

This step will reference some drawing properties to some layers (cc::Layer), and the drawing properties will be bound to the layer at this time. There is not much information provided by the materials in this step, but I personally think that it can be inferred by referring to the layer model in Photoshop and the corresponding drawing properties (black and white, brightness, contrast, etc.). In the future, research for this convenience will need to be done through additional documentation or source code.

Commit&Tilling

After the main thread draws, the updated data structure cc::Layer list and property tree are synchronized with the compositing thread impl (that is, the compositing thread mentioned above). After the synchronization operation, impl will extract the part of the layer that needs to be drawn and turn it into a smaller-grained tile. Tile is the basic unit of rasterization work, which records the position of the part that needs to be rasterized on the page and the drawing steps and other related information. After the Tile is generated, it will be put into the Tile pool, and the rasterization thread will perform the rasterization operation according to the priority. The rasterization priority is inferred from the distance of the browser viewport from the location specified by the Tile. There may be multiple rasterization threads, which exist in the GPU process.

After the rasterization process, a Tile generates a Quad, and a Quad is a command to draw a Tile (which has been rasterized) at a specific location on the screen. Its specific path is Tiles->AppendQuads()->CompositorFrame. The CompositorFrame contains DrawQuardList. value

It should be noted that the CompositorFrame is the output of the rendering process, and it is also the data structure passed between the rendering process and the GPU process. The rasterization operation of Tile is generally performed in the GPU process, which can achieve better performance.

Activate&Draw

In order to further optimize the efficiency of Commit&Tilling and alleviate the possible incoherence between the rendering process and the GPU thread, Chromium introduces two workflows, Pending Tree and Active Tree, in the impl thread of the rendering process. What is received in the Pending Tree is the list of layers (including layers and attribute trees) submitted by the main thread to the impl thread, and rasterized in time. Active Tree receives the rasterized layer (including the rasterization result), and performs the drawing operation. The introduction of this multi-workflow model makes it possible to draw while rastering, which improves throughput.

Display

This needs to be extended to describe a model. In Chromium, the rendering process and the GPU process are not one-to-one. In the actual model, it is likely that multiple rendering processes correspond to one GPU process. Here, the personal inference is that each tab corresponds to a rendering process. Also note that the compositor in the UI framework in the browser process will also communicate with the GPU process. So it can be understood that the GPU process is responsible for the rasterization and rendering operations of the entire software at runtime.

Compositing frames are passed between the GPU process and its corresponding module in communication, and the compositing frame is related to the surface (where it appears on the screen). There is a concept of surface aggregation, and the existing materials do not provide much information. Personally, I think it involves the processing and optimization of overlapping positions between synthetic frames.

The rest of the operations are in the GPU process. It generates and executes a set of GL calls using the information provided by the Quad in the composite frame. And GL calls are serialized through the command buffer and make proxy calls. The above process takes place in the via thread. The real GL calls will be made in the gpu thread (separate from the GPU process), which will eventually perform the actual screen drawing operations through the OpenGL API.

In the new Display operation mode, the via thread will use SKIA for drawing operations, and pass the result (deferred display list data structure) to the gpu thread, and finally the SKIA backend located in the gpu thread will do the actual drawing operation based on the obtained information GL calls (or Valkan calls).

Some final details

Since modern monitors generally use double buffers, one buffer is used for drawing and one buffer is used for display at a particular moment. After the former completes the drawing, through the Swap operation, the former content (frame) will be displayed on the screen and the latter will be used for the drawing of the next frame. Repeatedly so.

Both the rasterization operation and the display operation are performed in the GPU process. The reason for the former is to use the GPU to accelerate the rasterization operation.

 

 

 

 

 

zh_CNZH-CN