Automatic Testing and Improvement of Machine Translation (TransRepair) paper reading summary

Not long ago I read the paper Automatic Testing and Improvement of Machine Translation, which proposed an automatic testing method (called TransRepair) for machine translation models in the field of software testing. Below I will summarize what this paper has to say in several aspects. Automatic Testing and Improvement of Machine Translation TransRepair is a method for automatic detection and automatic repair of machine translation software consistency problems. TransRepair provides a black box with… Read more

Structure-Invariant Testing for Machine Translation (SIT) paper reading summary

Not long ago, I read Structure-Invariant Testing for Machine Translation, a paper that presents a study on the detection of robustness problems in machine translation software systems. Below I will introduce some of my understanding from several aspects. The main content of Structure-Invariant Testing for Machine Translation SIT is mainly the research on the detection of the robustness of the machine translation software system. SIT is developed based on a metamorphic relation "structural invariance" in the metamorphosis test. The main steps of SIT are to select original sentences, generate similar sentences, and obtain from translation software... Read more

Intel x86 processor processor interrupt handling overview knowledge points

Processor interrupt handling is a must-have knowledge for learning computer architecture. Under Intel's x86 processors, interrupts are collectively referred to as external interrupts, exceptions, and traps. External interrupts come from hardware and are random. An exception originates from within the processor and indicates that some error condition has been detected during the execution of an instruction by the processor. Traps come from programs and are generated by instructions such as INT n, INTO, etc. External interrupts can be masked, but lines and exceptions cannot be masked. The way to mask the interrupt is to clear the IF flag in the EFLAGS register. A program that handles an interrupt is called an interrupt handler. Programs that handle exceptions are called exception handlers. The program that handles the trap is called a system call service program. Handlers can be located anywhere within and space... Read more

Intel x86 processor memory protection summary knowledge points

Once the processor memory protection mechanism is enabled, the processor performs a protective check on each memory access to ensure that all accesses satisfy the protection policy. Protection checking and address translation are performed in parallel. Protection checks include segment-level checks and page-level checks. The checking sequence is segment first and then page. The checking basis is segment descriptor, page directory and page table. The basis for checking is the privilege level. A privilege level is a privilege number defined by Intel to implement protection. Segment level checks include segment boundary checks, segment type checks, privilege level checks, long pointer checks, etc. The principles of segment-level checking are: Code at low privilege level cannot access data at high privilege level. Code at high privilege level can access data at low privilege level. Code can only use the same stack as its privilege level. When the privilege level switches, the stack … Read more

Extended Concepts of Memory Management

Flat-panel memory management. Block out the segment management, completely adopt the page management. The practice is to define a code segment and define a data segment. Both the code segment and the data segment are 4GB in size. The logical address is then a linear address. Secure tablet-style memory management. The practice is to define the kernel code segment, the kernel data segment, the user code segment, and the user data segment. The base addresses of the four segments are all 0 and the size is 4GB. The kernel segment is used when the process executes the kernel code, and the user segment is used when the user code is executed. Segment address translation is masked, but privilege-based protection features are preserved. For the same process, since its use of the four segments will never overlap, the four segments can be superimposed and regarded as the process flat address space, and the four sets of page directories/page tables can also be merged into one set… Read more

Summary of paged memory management

In segment-paged memory management, the linear address of a segment is divided into linear pages of equal size (4KB, 4MB, or 2MB, etc.). The physical memory space is also divided into physical pages of the same size. The operating system maintains a page table that manages the mapping of linear pages to physical pages. The page table is divided into two levels in the IA-32 architecture, namely the page directory and the page table. The page directory is an array whose elements are called page directory entries (PDEs), and each page directory entry describes a page table. The size of the page directory is one page (4KB), and there are 1024 page directory entries in a page directory. The size of a page directory entry is 4 bytes. The page table size is one page (4KB). Page table entries are 4 bytes in size, so a page table can describe up to 1024 linear pages. Physical pages are pre-divided, which open… Read more

Summary of Segment Memory Management

The IA-32 system provides a segmented page memory management mechanism, which is segmented and then paged. Paging is provided to support virtual memory. Segment: The addressable linear memory space of the processor is divided into several segments of different sizes. A segment is a contiguous extent in linear address space. Sections can hold code, data, stacks, or other data structures. The attribute information of the segment is described by the corresponding segment descriptor. A segment descriptor is a data structure. Intel uses segment descriptor table to manage. The segment descriptor table can be up to 64KB. When G is 0, the segment is in bytes, and the maximum segment length is 1MB. When G is 1, segments are measured in pages (4kb). The maximum segment size is 4GB. DPL is the privilege level of the segment, and its value is between 0 and 3. S is the system flag... Read more

Hardware Platform Overview

Memory is storage space that the processor can directly access. In order to speed up memory access, computer systems usually provide some caches (Cache), which are usually managed by hardware. An I/O device consists of an I/O controller and a physical device, and the processor manages the physical device through the I/O controller. The I/O controller is mainly composed of control and status registers (CSR) and data registers. The processor obtains the device status by reading the CSR, controls the device action by writing the CSR, and exchanges data by reading and writing data registers. The kernel typically abstracts an I/O device into a set of registers, and assigns a register an I/O address. The processor accesses I/O registers through I/O addresses. Many device registers in modern computer systems can be mapped... Read more

IA-32 System Outline

The IA-32 architecture consists of three modes and a quasi-operational structure: Real Mode: An 8086-compatible mode of operation, with some extensions. Protected Mode: A basic operating mode of a processor in which all instructions and all features of the processor are available to achieve maximum performance. The system management mode is a transparent management mechanism provided to the operating system to implement special operations such as power management. Virtual 8086 mode is a quasi-operational mode that allows the processor to execute real-mode programs in protected mode. A new IA-32e operating mode for the Intel 64 architecture contains two sub-mode compatibility modes in which most IA-32 architectures can be run unmodified… Read more

UNIX pseudo terminal

Introduce the relevant theoretical knowledge about UNIX in recent research, and record relevant important concept memos here. Overview Pseudo-terminal means that to an application, it looks like a terminal, but in fact it is not a real terminal. Usually a process opens a pseudo-terminal master and calls fork. The child process establishes a new session, opens a corresponding pseudo-terminal slave device, copies its file descriptor to stdin, and then calls exec. The pseudo-terminal slave device is called the controlling terminal of the child process. Looking like a bidirectional pipe, the terminal line discipline on the slave device gives us additional processing power that a normal pipe doesn't have. Typical uses of pseudo-terminals The most typical examples of network login servers are telnetd and rlogind… Read more