Quantcast
Channel: CodeSection,代码区,数据库(综合) - CodeSec
Viewing all articles
Browse latest Browse all 6262

Quality Assurance at Cloudera: Highly-Controlled Disk Injection

$
0
0

Recently installed fault-injection techniques are making quality assurance processes yet more rigorous.

In aprevious installment of our series about quality assurance inside Cloudera, we described the fault-injection frameworks (AgenTEST and Sapper) that Cloudera Engineering has devised. The fault-injection framework starts and stops injections, to determine when and how they should occur, respectively.

On that occasion, we presented a number of disk-related injections implemented in AgenTEST, including:

BurnIO: Runs disk-intensive processes, simulating a noisy neighbor or a faulty disk. It is possible to specify the amount of IOPS to burn and the mount point that is to be affected. FillDISK: Writes a huge file to the specified root device, filling up the root disk. It is possible to specify the percentage of free disk space to be consumed. CorruptHDFS: Corrupts one HDFS file using the size and the offset specified as input. UNMOUNT: Un-mounts one mounting point of a device. RONLY: Re-mounts as a read-only device for the specified mounting point of a device.

Although these injections are useful, they only act at the mount-point level and do not provide any guarantee of actually interfering with the application under test given the low-fidelity control involved (such as simulating a noisy neighbor or a faulty disk).

In this post, we will present a new set of low-level highly-controlled disk injections (HCDI) recently added to Cloudera’s fault-injection portfolio that guarantee such interference. With these new injections, we are now able to target specific files and/or folders (not possible previously). Moreover, we can decide when injections should occur in a fine-grained way: while reading, writing, opening and closing, or a combination of the above. Finally, the expressivity of the parameters for the injections are quite improved; for example, we are able to introduce a specific delay (in ms) or simulate a specific error (we can actually specify the probability of such injections) when accessing a file.

New Injectionsand Configuration

These new HCDIs include:

DDELAY (disk access delay). Introduces a configurable latency for accesses of a specific file (or folder respectively). It is possible to specify the access mode to intercept (e.g. while opening, reading, writing, closing the file), the probability that the injection will occur, and the actual delay in ms. DCORRUPT (disk data corruption). Corrupts a configurable percentage of data read from or written to a file (or folder respectively). It is possible to specify the access mode to target (reading and/or writing), the probability of hitting the injection, and the percentage of bytes to corrupt during each access. DFAIL (disk access failure). Simulates failures while accessing a specific file (or folder respectively). As for the other injections, it is possible to specify the access mode to target (O, R, W, C), the probability that the injection will occur, and finally, the error code to return if the injection is hit.

AgenTEST activates/deactivates these injections following a mechanism explainedhere. In particular, the injection to apply is encoded in the name of the file. For example, let’s assume that AgenTEST is watching the folder /tmp/AgenTEST-inj and that we run:

touch/tmp/AgenTEST-inj/DDELAY~/.foo.bar~RW~100~5

AgenTEST will introduce a delay of 5ms for each (100% probability) read/write operation occurring on the files in /foo/bar . The injection will stay in place until we delete this file as:

rm/tmp/AgenTEST-inj/DDELAY~/.foo.bar~RW~100~5

Particularly interesting is the DFAIL injection:

touch /tmp/AgenTEST-inj/DFAIL~/.foo.bar~RW~50~5

The last parameter is the error code to return when the injection is hit. In this particular case, the injection will generate an I/O error half of the time (50% of probability) that an access occurs.

The table below lists all possible reported codes:


Quality Assurance at Cloudera: Highly-Controlled Disk Injection
AgenTEST using HCDI

There are two requirements for using HCDIs:

Setting the variable LD_PRELOAD (explained in more detail below) Providing the configuration file, HCDI_CONFIG , which defaults to ~/.hcdi

The HCDI_CONFIG file (see example below) contains all the parameters needed to determine what and when to inject. This configuration file can change dynamically, and the injection will adjust accordingly.

# FailedOp:/path/:RWOC:<probability%>:<errorcode> FailedOp:/tmp/foo:R:10:5 FailedOp:/tmp/foo2:RW:10:7

Essentially, AgenTEST serves as a “front-end” for these HCDIs, and every time that a new injection is required, it updates the configuration file. However, modifications could also be made by hand, e.g.:

# Line added in .hcdi by AgenTEST when DFAIL~.foo.bar~RWO~100~5 is created in the /tmp/AgenTEST-inj/ FailedOp:/foo/bar:RWO:100:5 How Does It Work?

Executable programs depend on a number of shared libraries (except if statically-linked). To see which libraries are linked, it is possible to list the dependencies with the linux ldd command.

For example, /bin/date depends on the following libraries:

$ ldd /bin/date linux-vdso.so.1 => (0x00007fff571ff000) librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fd3db1e0000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd3dae21000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fd3dac03000) /lib64/ld-linux-x86-64.so.2 (0x00007fd3db3fa000)

When a program is executed, the dynamic linker looks at this list of libraries. It locates the libraries on the filesystem based on configuration files and environment variables, loads the libraries into memory, and links the pieces together to make a working executes the application.


Quality Assurance at Cloudera: Highly-Controlled Disk Injection

The dynamic linker also provides a way to pre-load a library that gets inserted first into the chain to selectively override functions provided by other shared libraries. On Linux, this feature is available via the LD_PRELOAD environment variable.

With HCDIs, the idea is to intercept function calls, messages, or events passed between software components and use a custom implementation, or hook , to manipulate them.


Quality Assurance at Cloudera: Highly-Controlled Disk Injection

This “hooking” approach has three main advantages:

There is no need to search for the function definition in the library, such as libc , and change it. There is no need to recompile the library’s source code. The application itself doesn’t know that the calls are being intercepted.

The libc.so.6 in the ldd output is the C runtime library; it provides the standard functions such as malloc() , printf() , and localtime() . To override a particular function, we simply build a shared library that exports that function. We use the proper definition of the function using dlsym and delegate to this original, if needed.

For example:

#define _GNU_SOURCE #include <time.h> #include <dlfcn.h> #include <stdio.h> struct tm *(*orig_localtime)(const time_t *timep); struct tm *localtime(c

Viewing all articles
Browse latest Browse all 6262

Trending Articles