程式扎記: [Linux 文章收集] How process input and output works in Linux

標籤

2015年4月12日 星期日

[Linux 文章收集] How process input and output works in Linux

Source From Here 
Preface 
With a basic knowledge of stdout, stderr, and stdin, you can usually get through your daily tasks, but by learning a little more of how those things work under the hood, you can put them to more interesting, and powerful uses. 

Let's start by taking a look at how commands and the shell really work, and then we'l move on to manipulating the input and output of any running process. 

The bash shell (or a command) and its input and output 
Contrary to popular belief, the bash redirection operators do not need to appear at the end of the line; they may appear anywhere on the line, even mixed in with the command line arguments. The following is perfectly valid: 
# 2>err ls 1>out foo nofile

and is equivalent to: 
# ls -l foo nofile 1>out 2>err

Both commands produce the same results: a long listing for the existing file "foo" is placed in a file called "out" and an error message regarding non-existing file "nofile" is placed into a file called "err." 

What the bash redirection operators actually do is replace and/or duplicate the file descriptors associated with the command being run. In this case our command is the "ls" process, and our bash shell "execve"s this program, and an environment including the file descriptors is created for the duration that the lsprogram runs. There is a file descriptor for standard input (0), standard output (1), and standard error (2). You may see them by determing the process id for the program and by listing the /proc/[pid]/fd/ directory. You will see that files "0", "1", and "2" are symbolic links to other files on the system, such as device files associated with psuedo-terminals (your screen.In our example, fd "1" is changed to point to the file "out" instead of the terminal screen. And similarly for fd "2"

One more example: 
$ ftp ftp.openbsd.org 2>erro 1>out # All output will be redirect to erro and out file

Open another terminal: 
$ ps -aux | grep ftp
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
root 3480 0.0 0.1 116124 1144 pts/2 S+ 18:27 0:00 ftp ftp.openbsd.org
root 3482 0.0 0.0 103252 836 pts/3 S+ 18:27 0:00 grep ftp

$ ls -hl /proc/3480/fd/
total 0
lrwx------. 1 root root 64 Jan 6 18:27 0 -> /dev/pts/2
l-wx------. 1 root root 64 Jan 6 18:27 1 -> /root/test/out
l-wx------. 1 root root 64 Jan 6 18:27 2 -> /root/test/erro
lrwx------. 1 root root 64 Jan 6 18:27 3 -> socket:[29402]

If you want to have a little fun, open a couple of different windows (I recommend screen, or even better, tmux), use the "tty" command to determine the terminal device associated with each window, and then from within one window, launch a new bash process and redirect its output to the other window, as follows:
$ 2>/dev/pts/2 bash

This launches a bash shell whose error output is sent to the second window. Now type commands in the new bash shell and see what happens. Standard error should appear in the other window and standard output should appear in the original window. Determine the pid of the new shell and list the /proc/[pid]/fd folder and see what you find there. 

Now a word on pipes. Like the redirection operator, a pipe also modifies file descriptors. If you have "command1 | command2", the pipe changes the stdout (fd 1) of command1 and points it to a new pipe. At the same time, it changes the stdin (fd 0) of command2 and points it to that same pipe. It does this before any other redirection operators are evaluated. You can see this for yourself by typing the following commands: 
$ tty
/dev/pts/2
$ ps aux | grep " bash" | grep pts\/2 | egrep -v "grep"
root 3736 0.0 0.1 108472 1904 pts/2 S 18:47 0:00 bash
$ ps aux | grep "cat" | grep pts\/2 | egrep -v "grep"
root 3737 0.0 0.0 100936 560 pts/2 S 18:47 0:00 cat -
$ ll /proc/3736/fd
total 0
lrwx------. 1 root root 64 Jan 6 18:49 0 -> /dev/pts/2
l-wx------. 1 root root 64 Jan 6 18:49 1 -> pipe:[31227]
lrwx------. 1 root root 64 Jan 6 18:47 2 -> /dev/pts/2
lrwx------. 1 root root 64 Jan 6 18:49 255 -> /dev/pts/2

$ ll /proc/3737/fd
total 0
lr-x------. 1 root root 64 Jan 6 18:48 0 -> pipe:[31227]
lrwx------. 1 root root 64 Jan 6 18:48 1 -> /dev/pts/2
lrwx------. 1 root root 64 Jan 6 18:47 2 -> /dev/pts/2

You see that fd 1 for bash, and fd 0 for cat, are both pointed to the same pipe. 
It's important to realize that the redirection operators are evaluated from left to right. Particularly so when you want to swap stdout and stderr

Open a new bash shell, cd into an empty directory, and issue the following commands: 
$ touch foo
$ ls foo nofile
ls: cannot access nofile: No such file or directory
foo

Both standard input and output are going to the same place - the local terminal screen. Now issue the following commands: 
# tty
/dev/pts/2
# 3>&2 2>&1 1>&3 3>&- ls foo nofile
ls: cannot access nofile: No such file or directory
foo

You see the same output as before, but something different must be going on because we are using a lot of redirects. Evaluating from left to right, we create a new file descriptor 3 that points to the same place 2 currently points (/dev/pts/2); then we make fd 2 point to where fd 1 currently points (also /dev/pts/2); then we make fd 1 point to where 3 currently points (again, /dev/pts/2). And we close file descriptor 3 for good measure since we aren't using it anymore. We have just swapped standard input and standard output, although everything still goes to the same place (/dev/pts/2) and so we don't notice any difference in the output. So we didn't really achieve anything, except now we know how to swap stdin and stdout. 

Which can be useful. Consider the following: 
# 3>&2 2>&1 1>&3 3>&- ls foo nofile | cat - >out
foo
# cat out
ls: cannot access nofile: No such file or directory

1. The first thing to happen is the creation of a pipe, then fd 1 for "ls" and fd 0 for "cat" are both pointed to this pipe.
2. 3>&2: Then, on the "ls" side the following: new file descriptor 3 is created and pointed to where fd 2 points, which is /dev/pts/2.
3. 2>&1: Then file descriptor 2 is pointed to where 1 currently points which is the named pipe.
4. 1>&3: Then file descriptor 1 is pointed to where 3 points which is /dev/pts/2.
5. 3>&-: Then fd 3 is closed. Then the "ls" command is run.

The error output (fd 2) goes into the named pipe. The standard output (fd 1) goes to the screen (/dev/pts/2). Meanwhile, on the "cat" side, the standard input is provided by the pipe, and we are redirecting the standard output to a file called "out". Thus, the cat receives the error output from "ls" on the pipe and treats it as its standard input (fd 0). "Cat" copies its input to its output, which in this case is the file called "out". In summary: the standard output from "ls" is echoed to screen and the error output is piped into the cat command on the right-hand side of the pipe. 

Once you understand this process, many possibilities open up. For example, if you would like to pipe stdout into one command and pipe stderr into a different command, this is easily achieved by creating your own pipes and setting the file descriptors to those devices, as shown in the next section: 

Using pipes to capture different output streams 
Since you know now that pipes and redirection operators are nothing more than manipulation of file descriptors, you can create and use your own pipes as you see fit. Let's run a command, and pipe its standard output into another command, and its standard error into yet a different command. 
1. First let's open up three tmux windows 

2. Next let's create two pipes 
# mkfifo /tmp/myfifo1; mkfifo /tmp/myfifo2


3. Now in window1 launch an ls command whose standard output goes into pipe1 and whose standard error goes into pipe2: 
# 1>/tmp/myfifo1 2>/tmp/myfifo2 ls foo nofile

When you hit return, nothing happens, that's because it's waiting for the other end of the pipe to read it. 

4. In window 2, launch a cat process whose standard input is the first pipe: 
# 

This also will wait for pipe to close before doing anything. 

5. In window 3, launch a second cat process that reads from the second pipe: 
# 

Hitting return on this process will complete the chain, and all three processes will terminate. 

As you can see in the following picture, the expected results are that no output from the ls command appears in window 1 (upper one); the standard output from the ls command appears in window 2 (middle); and the standard error from the ls command appears in window 3 (bottom.) So now you know how to treat stderr and stdout independently and use them in any manner you see fit. 
 

stderr to screen, stdout and stderr to file 
Here is how to run one or more commands, capturing the standard output and error, in the order in which they are generated, to a logfile, while displaying only the standard error on any terminal screen you like. 
1. Open two windows (shells) 

2. Create some test files: 
# touch /tmp/foo /tmp/foo1 /tmp/foo2


3. in window1: 
# mkfifo /tmp/fifo
# 
/tmp/logfile

4. Then, in window2: 
# (ls -l /tmp/foo /tmp/nofile /tmp/foo1 /tmp/nofile /tmp/nofile; echo successful test; ls /tmp/nofile1111) 2>&1 1>/tmp/fifo | tee /tmp/fifo 1>/dev/pts/1

The subshell runs some "ls" and "echo" commands in sequence, such that some succeed (providing stdout) and some fail (providing stderr) in order to generate a mingled stream of output and error messages, so that you can verify the correct ordering in the log file. 

The ordering of output and error is preserved, the syntax is simple and clean, and there is only a single reference to the output file. Plus there is flexiblity in putting the extra copy of stderr wherever you want. Once you understand how it works, replace the “ls” and “echo” commands with scripts or commands of your choosing.

How to take control of a running process's stdin/stdout/stderr 
The above dialogue prepares us for an even more interesting task: how to manipulate the input and output of processes that are already running

Let's say you are physically at the console of your linux server, you log in, you run a bunch of commands, and you walk away without closing the shell (let's ignore any security implications). Later you are away from the server room, you log in remotely and you would like to view the command history from earlier. Perhaps you need to know exactly which command options you used for a particular program. How retrieve this command history? This is a bit of a dilemma because: 
1. You don't have physical access to the console
2. The bash shell is still running which means it holds the command history in memory (it has not yet written to .bash_history)
3. There is no signal you can send to the bash process to cause it to dump its history
4. Even if you could dump the history, there's no guarantees it won't be overwritten by other bash processes on the system before you have a chance to review it
5. etc

The way to deal with this is to identify the bash process in question (the target shell) and, as long as you have access to another shell on the system, whether via ssh or other means, you can take control of the standard input and output of the target shell. Then you can issue a "history" command which will retrieve the history from memory and display it. You can also issue any other commands you like because you now interact with the target shell. In more layman's terms, you "redirect the standard input and output" of the target shell to the shell you have access to

This procedure actually allows you to control the input and output of any running process on the system - shell or otherwise - which means its uses extend beyond our example. As long as you understand the principle, you can modify it to suit your needs. 

The procedure works on Linux systems and requires gdb. (The GNU debugger). GDB has the capability to attach to a running process and modify its parameters, in our case we use it to change the file descriptors for the process stdin, stdout, and stderr. A couple points to keep in mind is that when GDB attaches to a process, it suspends execution of the process in the same way as SIGSTOP. When GDB detaches, execution resumes in the same way as SIGCONT

For any process on a Linux system, if you know its pid, you may examine its file descriptors with "ls -l /proc/[pid]/fd/". This brings up a question: if the filesystem gives you access to the file descriptors, why don't you just modify the files in /proc/[pid]/fd/ instead of using GDB? Two reasons: 
1. The kernel does not let you. If you try to modify those files, regardless of whether you own them or are root, you will get "Permission denied."
2. More importantly, you should not be modifying file descriptors on a running process anyway. You first need to suspend the process - whether by GDB, SIGSTOP, or Ctrl-Z (which is just a shortcut way to send SIGSTOP) - and then make your changes.

Procedure 
This procedure makes use of "screen" or "tmux", which is not strictly necessary, but makes life easier. 

1. ssh into the server. launch a screen or tmux session and open a few windows. You can determine the pseudo-device for each window by typing "tty." Let's say we have "/dev/pts/[123]" corresponding to three shells we're running in the screen or tmux session 

2. determine the pid of the target bash process. this process is currently associated with a terminal device such as /dev/tty1. Because this was the device associated with mingetty when you logged in at the console 

3. from screen window 1, run "gdb -p [pid]" and run the following commands within gdb: 
# p dup2(open("/dev/pts/2",0),0) # this changes the standard input for the target process
# p dup2(open("/dev/pts/3",1),1) # this changes the standard output for the target process
# p dup2(open("/dev/pts/3",1,),2) # this changes the the standard err for the target process
# detach
# quit

Step 3 redirects the standard input to Window 2 and the output to Window 3. stdin is opened read-only. stdout and stderr are opened writable. When you detach and quit gdb, the target shell resumes execution with the new file descriptors. 

4. from window 1 (/dev/pts/1), "ls -l /proc/[pid]/fdto verify the file descriptor changes for the bash process we want to manipulate 

What you type in window 2 is now fed to two place: the bash shell launched with Window 2, and the target shell. therefore, from window 2 (/dev/pts/2), type "hhiissttoorryy[return][return]". the reason you have to type everything twice is because the input is divied out to both the current bash shell and the target bash shell. This is because the operating knows there are two sources that are tapping into the keyboard input for /dev/pts/2, and it fairly distributes the characters you type. the first character goes to one destination, the next goes to the second destination, etc. If you had three processes whose stdin was /dev/pts/3, then you would have to type each character three times in order to ensure the target shell receives the full command. Otherwise it only gets every third character. The kernel feeds the input characters in round-robin fashion to the recipients. 

The problem with the above step is that it runs "history" in both shells. You can get around this by temporarily setting the stdin for the Window 2 bash shell to some unused device, such as /dev/tty5 or something. This means the /dev/pts/2 keyboard is now associated as input for only one process (instead of two). Now you can type commands as normal (they just won't be echoed to the Window 2 screen, they will be echoed to /dev/pts/3 instead because that's where you've redirected the stdout.

Since you typed 'history' and it was fed to the target process, whose stdout is window 3, switch to window 3 so you can see the command output. now you have the command history for the target bash shell. back to window 1, use gdb again on [pid] to reset the standard in,out,err for the target shell back to their original values (/dev/tty1). And you can also set the Window 2 stdin back to what it should be. 

If you like, you can clean up the extra file descriptors by removing them with 'exec 3>&-' (removes fd 3 for example

You can probably make the above even easier and more user-friendly by using a single Window for both input and output: open the window, determine its tty or pty, temporarily set its shell's assigned standard in and out to an unusued device, assign the target shell's standard in/out to use this tty or pty. This way you can use a single screen for all input and output with the target shell.

沒有留言:

張貼留言

網誌存檔

關於我自己

我的相片
Where there is a will, there is a way!