程式扎記: [ Py DS ] Ch1 - IPython: Beyond Normal Python

Source From Here

Keyboard Shortcuts in the IPython Shell

If you spend any amount of time on the computer, you’ve probably found a use for keyboard shortcuts in your workflow. Most familiar perhaps are Cmd-C and Cmd-V (or Ctrl-C and Ctrl-V) for copying and pasting in a wide variety of programs and systems. Power users tend to go even further: popular text editors like Emacs, Vim, and others provide users an incredible range of operations through intricate combinations of keystrokes.

The IPython shell doesn’t go this far, but does provide a number of keyboard shortcuts for fast navigation while you’re typing commands. These shortcuts are not in fact provided by IPython itself, but through its dependency on the GNU Readline library: thus, some of the following shortcuts may differ depending on your system configuration. Also, while some of these shortcuts do work in the browser-based notebook, this section is primarily about shortcuts in the IPython shell.

Once you get accustomed to these, they can be very useful for quickly performing certain commands without moving your hands from the “home” keyboard position. If you’re an Emacs user or if you have experience with Linux-style shells, the following will be very familiar. We’ll group these shortcuts into a few categories: navigation shortcuts, text entry shortcuts, command history shortcuts, and miscellaneous shortcuts.

Navigation Shortcuts
While the use of the left and right arrow keys to move backward and forward in the line is quite obvious, there are other options that don’t require moving your hands from the “home” keyboard position:

Text Entry Shortcuts
While everyone is familiar with using the Backspace key to delete the previous character, reaching for the key often requires some minor finger gymnastics, and it only deletes a single character at a time. In IPython there are several shortcuts for removing some portion of the text you’re typing. The most immediately useful of these are the commands to delete entire lines of text. You’ll know these have become second nature if you find yourself using a combination of Ctrl-b and Ctrl-d instead of reaching for the Backspace key to delete the previous character!

Command History Shortcuts
Perhaps the most impactful shortcuts discussed here are the ones IPython provides for navigating the command history. This command history goes beyond your current IPython session: your entire command history is stored in a SQLite database in your IPython profile directory. The most straightforward way to access these is with the up and down arrow keys to step through the history, but other options exist as well:

Miscellaneous Shortcuts
Finally, there are a few miscellaneous shortcuts that don’t fit into any of the preceding categories, but are nevertheless useful to know:

The Ctrl-c shortcut in particular can be useful when you inadvertently start a very long-running job. While some of the shortcuts discussed here may seem a bit tedious at first, they quickly become automatic with practice. Once you develop that muscle memory, I suspect you will even find yourself wishing they were available in other contexts.

IPython Magic Commands
The previous section(s) showed how IPython lets you use and explore Python efficiently and interactively. Here we’ll begin discussing some of the enhancements that IPython adds on top of the normal Python syntax. These are known in IPython as magic commands, and are prefixed by the % character. These magic commands are designed to succinctly solve various common problems in standard data analysis.Magic commands come in two flavors: line magics, which are denoted by a single % prefix and operate on a single line of input, and cell magics, which are denoted by a double %% prefix and operate on multiple lines of input. We’ll demonstrate and discuss a few brief examples here, and come back to more focused discussion of several useful magic commands later in the chapter.

Pasting Code Blocks: %paste and %cpaste
When you’re working in the IPython interpreter, one common gotcha is that pasting multiline code blocks can lead to unexpected errors, especially when indentation and interpreter markers are involved. A common case is that you find some example code on a website and want to paste it into your interpreter. Consider the following simple function:

view plaincopy to clipboardprint?
>>> def donothing(x):  
...     return x  

The code is formatted as it would appear in the Python interpreter, and if you copy and paste this directly into IPython you get an error:

In the direct paste, the interpreter is confused by the additional prompt characters. But never fear—IPython’s %paste magic function is designed to handle this exact type of multiline, marked-up input:

A command with a similar intent is %cpaste, which opens up an interactive multiline prompt in which you can paste one or more chunks of code to be executed in a batch:

These magic commands, like others we’ll see, make available functionality that would be difficult or impossible in a standard Python interpreter.

Running External Code: %run
As you begin developing more extensive code, you will likely find yourself working in both IPython for interactive exploration, as well as a text editor to store code that you want to reuse. Rather than running this code in a new window, it can be convenient to run it within your IPython session. This can be done with the %run magic.

For example, imagine you’ve created a myscript.py file with the following contents:
- myscript.py

view plaincopy to clipboardprint?
def square(x):  
    """square a number"""  
    return x ** 2  
  
for N in range(1, 4):  
    print(N, "squared is", square(N))  

You can execute this from your IPython session as follows:

In [6]: %run myscript.py
1 squared is 1
2 squared is 4
3 squared is 9

Note also that after you’ve run this script, any functions defined within it are available for use in your IPython session:

In [3]: square(5)
Out[3]: 25

There are several options to fine-tune how your code is run; you can see the documentation in the normal way, by typing %run? in the IPython interpreter.

Timing Code Execution: %timeit
Another example of a useful magic function is %timeit, which will automatically determine the execution time of the single-line Python statement that follows it. For example, we may want to check the performance of a list comprehension:

In [8]: %timeit L = [n ** 2 for n in range(1000)]
1000 loops, best of 3: 325 μs per loop

The benefit of %timeit is that for short commands it will automatically perform multiple runs in order to attain more robust results. For multiline statements, adding a second % sign will turn this into a cell magic that can handle multiple lines of input. For example, here’s the equivalent construction with a for loop:

We can immediately see that list comprehensions are about 10% faster than the equivalent for loop construction in this case. We’ll explore %timeit and other approaches to timing and profiling code later.

Help on Magic Functions: ?, %magic, and %lsmagic
Like normal Python functions, IPython magic functions have docstrings, and this useful documentation can be accessed in the standard manner. So, for example, to read the documentation of the %timeit magic, simply type this:

In [10]: %timeit?

Documentation for other functions can be accessed similarly. To access a general description of available magic functions, including some examples, you can type this:

In [11]: %magic

For a quick and simple list of all available magic functions, type this:

In [12]: %lsmagic
Out[3]:
Available line magics:
%alias %alias_magic %autocall %autoindent %automagic %bookmark %cd %cls %colors ...

Available cell magics:
%%! %%HTML %%SVG %%bash %%capture %%cmd %%debug ...

Automagic is ON, % prefix IS NOT needed for line magics.

Finally, I’ll mention that it is quite straightforward to define your own magic functions if you wish. We won’t discuss it here, but if you are interested, see the references listed in “More IPython Resources”.

Input and Output History
Previously we saw that the IPython shell allows you to access previous commands with the up and down arrow keys, or equivalently the Ctrl-p/Ctrl-n shortcuts. Additionally, in both the shell and the notebook,IPython exposes several ways to obtain the output of previous commands, as well as string versions of the commands themselves. We’ll explore those here.

IPython’s In and Out Objects
By now I imagine you’re quite familiar with the In[1]:/Out[1]: style prompts used by IPython. But it turns out that these are not just pretty decoration: they give a clue as to how you can access previous inputs and outputs in your current session. Imagine you start a session that looks like this:

In [1]: import math

In [2]: math.sin(2)
Out[2]: 0.9092974268256817

In [3]: math.cos(2)
Out[3]: -0.4161468365471424

We’ve imported the built-in math package, then computed the sine and the cosine of the number 2. These inputs and outputs are displayed in the shell with In/Out labels, but there’s more—IPython actually creates some Python variables called In and Out that are automatically updated to reflect this history:

In [4]: print(In)
['', 'import math', 'math.sin(2)', 'math.cos(2)', 'print(In)']
In [5]: Out
Out[5]: {2: 0.9092974268256817, 3: -0.4161468365471424}

The In object is a list, which keeps track of the commands in order (the first item in the list is a placeholder so that In[1] can refer to the first command):

In [6]: print(In[1])
import math

The Out object is not a list but a dictionary mapping input numbers to their outputs (if any):

In [7]: print(Out[2])
0.9092974268256817

Note that not all operations have outputs: for example, import statements and print statements don’t affect the output. The latter may be surprising, but makes sense if you consider that print is a function that returns None; for brevity, any command that returns None is not added to Out. Where this can be useful is if you want to interact with past results. For example, let’s check the sum of sin(2) ** 2 and cos(2) ** 2 using the previously computed results:

In [8]: Out[2] ** 2 + Out[3] ** 2
Out[8]: 1.0

The result is 1.0 as we’d expect from the well-known trigonometric identity. In this case, using these previous results probably is not necessary, but it can become very handy if you execute a very expensive computation and want to reuse the result!

Underscore Shortcuts and Previous Outputs
The standard Python shell contains just one simple shortcut for accessing previous output; the variable _ (i.e., a single underscore) is kept updated with the previous output; this works in IPython as well:

In [9]: print(_)
1.0

But IPython takes this a bit further—you can use a double underscore to access the second-to-last output, and a triple underscore to access the third-to-last output (skipping any commands with no output):

In [10]: print(__)
-0.4161468365471424
In [11]: print(___)
0.9092974268256817

IPython stops there: more than three underscores starts to get a bit hard to count, and at that point it’s easier to refer to the output by line number. There is one more shortcut we should mention, however—a shorthand for Out[X] is
_X (i.e., a single underscore followed by the line number):

In [12]: Out[2]
Out[12]: 0.9092974268256817
In [13]: _2
Out[13]: 0.9092974268256817

Suppressing Output
Sometimes you might wish to suppress the output of a statement (this is perhaps most common with the plotting commands that we’ll explore in Chapter 4). Or maybe the command you’re executing produces a result that you’d prefer not to store in your output history, perhaps so that it can be deallocated when other references are removed. The easiest way to suppress the output of a command is to add a semicolon to the end of the line:

In [14]: math.sin(2) + math.cos(2);

Note that the result is computed silently, and the output is neither displayed on the screen or stored in the Out dictionary:

In [15]: 14 in Out
Out[15]: False

Related Magic Commands
For accessing a batch of previous inputs at once, the %history magic command is very helpful. Here is how you can print the first four inputs:

In [16]: %history -n 1-4
1: import math
2: math.sin(2)
3: math.cos(2)
4: print(In)

As usual, you can type %history? for more information and a description of options available. Other similar magic commands are %rerun (which will re-execute some portion of the command history) and %save(which saves some set of the command history to a file).

IPython and Shell Commands
When working interactively with the standard Python interpreter, one of the frustrations you’ll face is the need to switch between multiple windows to access Python tools and system command-line tools. IPython bridges this gap, and gives you a syntax for executing shell commands directly from within the IPython terminal. The magic happens with the exclamation point: anything appearing after ! on a line will be executed not by the Python kernel, but by the system command line.

The following assumes you’re on a Unix-like system, such as Linux or Mac OS X. Some of the examples that follow will fail on Windows, which uses a different type of shell by default (though with the 2016 announcement of native Bash shells on Windows, soon this may no longer be an issue!). If you’re unfamiliar with shell commands, I’d suggest reviewing the Shell Tutorial put together by the always excellent Software Carpentry Foundation.

Quick Introduction to the Shell
A full intro to using the shell/terminal/command line is well beyond the scope of this chapter, but for the uninitiated we will offer a quick introduction here. The shell is a way to interact textually with your computer. Ever since the mid-1980s, when Microsoft and Apple introduced the first versions of their now ubiquitous graphical operating systems, most computer users have interacted with their operating system through familiar clicking of menus and drag-and-drop movements. But operating systems existed long before these graphical user interfaces, and were primarily controlled through sequences of text input: at the prompt, the user would type a command, and the computer would do what the user told it to. Those early prompt systems are the precursors of the shells and terminals that most active data scientists still use today.

Someone unfamiliar with the shell might ask why you would bother with this, when you can accomplish many results by simply clicking on icons and menus. A shell user might reply with another question: why hunt icons and click menus when you can accomplish things much more easily by typing? While it might sound like a typical tech preference impasse, when moving beyond basic tasks it quickly becomes clear that the shell offers much more control of advanced tasks, though admittedly the learning curve can intimidate the average computer user.

Shell Commands in IPython
You can use any command that works at the command line in IPython by prefixing it with the ! character. For example, the ls, pwd, and echo commands can be run as follows:

In [1]: !ls
myproject.txt
In [2]: !pwd
/home/jake/projects/myproject
In [3]: !echo "printing from the shell"
printing from the shell

Passing Values to and from the Shell
Shell commands can not only be called from IPython, but can also be made to interact with the IPython namespace. For example, you can save the output of any shell command to a Python list using the assignment operator:

In [4]: contents = !ls
In [5]: print(contents)
['myproject.txt']
In [6]: directory = !pwd
In [7]: print(directory)
['/Users/jakevdp/notebooks/tmp/myproject']

Note that these results are not returned as lists, but as a special shell return type defined in IPython:

In [8]: type(directory)
IPython.utils.text.SList

This looks and acts a lot like a Python list, but has additional functionality, such as the grep and fields methods and the s, n, and p properties that allow you to search, filter, and display the results in convenient ways. For more information on these, you can use IPython’s built-in help features. Communication in the other direction—passing Python variables into the shell—is possible through the {varname} syntax:

In [9]: message = "hello from Python"
In [10]: !echo {message}
hello from Python

The curly braces contain the variable name, which is replaced by the variable’s contents in the shell command.

Shell-Related Magic Commands
If you play with IPython’s shell commands for a while, you might notice that you cannot use !cd to navigate the filesystem:

In [11]: !pwd
/home/jake/projects/myproject

In [12]: !cd ..
In [13]: !pwd
/home/jake/projects/myproject

The reason is that shell commands in the notebook are executed in a temporary subshell. If you’d like to change the working directory in a more enduring way, you can use the %cd magic command:

In [14]: %cd ..
/home/jake/projects

In fact, by default you can even use this without the % sign:

In [15]: cd myproject
/home/jake/projects/myproject

This is known as an automagic function, and this behavior can be toggled with the %automagic magic function. Besides %cd, other available shell-like magic functions are %cat, %cp, %env, %ls, %man, %mkdir, %more, %mv, %pwd, %rm, and %rmdir, any of which can be used without the % sign if automagic is on. This makes it so that you can almost treat the IPython prompt as if it’s a normal shell:

In [16]: mkdir tmp
In [17]: ls
myproject.txt tmp/
In [18]: cp myproject.txt tmp/
In [19]: ls tmp
myproject.txt
In [20]: rm -r tmp

This access to the shell from within the same terminal window as your Python session means that there is a lot less switching back and forth between interpreter and shell as you write your Python code.

Errors and Debugging
Code development and data analysis always require a bit of trial and error, and IPython contains tools to streamline this process. This section will briefly cover some options for controlling Python’s exception reporting, followed by exploring tools for debugging errors in code.

Controlling Exceptions: %xmode
Most of the time when a Python script fails, it will raise an exception. When the interpreter hits one of these exceptions, information about the cause of the error can be found in the traceback, which can be accessed from within Python. With the %xmode magic function, IPython allows you to control the amount of information printed when the exception is raised. Consider the following code:

Calling func2 results in an error, and reading the printed trace lets us see exactly what happened. By default, this trace includes several lines showing the context of each step that led to the error. Using the %xmodemagic function (short for exception mode), we can change what information is printed. %xmode takes a single argument, the mode, and there are three possibilities: Plain, Context, and Verbose. The default is Context, and gives output like that just shown. Plain is more compact and gives less information:

The Verbose mode adds some extra information, including the arguments to any functions that are called:

This extra information can help you narrow in on why the exception is being raised. So why not use the Verbose mode all the time? As code gets complicated, this kind of traceback can get extremely long. Depending on the context, sometimes the brevity of Default mode is easier to work with.

Debugging: When Reading Tracebacks Is Not Enough
The standard Python tool for interactive debugging is pdb, the Python debugger. This debugger lets the user step through the code line by line in order to see what might be causing a more difficult error. The IPython-enhanced version of this is ipdb, the IPython debugger. There are many ways to launch and use both these debuggers; we won’t cover them fully here. Refer to the online documentation of these two utilities to learn more.

In IPython, perhaps the most convenient interface to debugging is the %debug magic command. If you call it after hitting an exception, it will automatically open an interactive debugging prompt at the point of the exception. The ipdb prompt lets you explore the current state of the stack, explore the available variables, and even run Python commands!

Let’s look at the most recent exception, then do some basic tasks—print the values of a and b, and type quit to quit the debugging session:

The interactive debugger allows much more than this, though—we can even step up and down through the stack and explore the values of variables there:

This allows you to quickly find out not only what caused the error, but also what function calls led up to the error. If you’d like the debugger to launch automatically whenever an exception is raised, you can use the %pdb magic function to turn on this automatic behavior:

Finally, if you have a script that you’d like to run from the beginning in interactive mode, you can run it with the command %run -d, and use the next command to step through the lines of code interactively.

Partial list of debugging commands
There are many more available commands for interactive debugging than we’ve listed here; the following table contains a description of some of the more common and useful ones:

For more information, use the help command in the debugger, or take a look at ipdb’s online documentation.

Profiling and Timing Code
In the process of developing code and creating data processing pipelines, there are often trade-offs you can make between various implementations. Early in developing your algorithm, it can be counterproductive to worry about such things. As Donald Knuth famously quipped, “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.”

But once you have your code working, it can be useful to dig into its efficiency a bit. Sometimes it’s useful to check the execution time of a given command or set of commands; other times it’s useful to dig into a multiline process and determine where the bottleneck lies in some complicated series of operations. IPython provides access to a wide array of functionality for this kind of timing and profiling of code. Here we’ll discuss the following IPython magic commands:

* %time: Time the execution of a single statement
* %timeit: Time repeated execution of a single statement for more accuracy
* %prun: Run code with the profiler
* %lprun: Run code with the line-by-line profiler
* %memit: Measure the memory use of a single statement
* %mprun: Run code with the line-by-line memory profiler

The last four commands are not bundled with IPython—you’ll need to install the line_profiler and memory_profiler extensions, which we will discuss in the following sections.

Timing Code Snippets: %timeit and %time
We saw the %timeit line magic and %%timeit cell magic used before. %%timeit can be used to time the repeated execution of snippets of code:

In[1]: %timeit sum(range(100))
100000 loops, best of 3: 1.54 μs per loop

Note that because this operation is so fast, %timeit automatically does a large number of repetitions. For slower commands, %timeit will automatically adjust and perform fewer repetitions:

Sometimes repeating an operation is not the best option. For example, if we have a list that we’d like to sort, we might be misled by a repeated operation. Sorting a presorted list is much faster than sorting an unsorted list, so the repetition will skew the result:

In[3]: import random
L = [random.random() for i in range(100000)]
%timeit L.sort()
100 loops, best of 3: 1.9 ms per loop

For this, the %time magic function may be a better choice. It also is a good choice for longer-running commands, when short, system-related delays are unlikely to affect the result. Let’s time the sorting of an unsorted and a presorted list:

Notice how much faster the presorted list is to sort, but notice also how much longer the timing takes with %time versus %timeit, even for the presorted list! This is a result of the fact that %timeit does some clever things under the hood to prevent system calls from interfering with the timing. For example, it prevents cleanup of unused Python objects (known as garbage collection) that might otherwise affect the timing. For this reason, %timeit results are usually noticeably faster than %time results.

For %time as with %timeit, using the double-percent-sign cell-magic syntax allows timing of multiline scripts

Profiling Full Scripts: %prun
A program is made of many single statements, and sometimes timing these statements in context is more important than timing them on their own. Python contains a built-in code profiler (which you can read about in the Python documentation), but IPython offers a much more convenient way to use this profiler, in the form of the magic function %prun.

By way of example, we’ll define a simple function that does some calculations:

view plaincopy to clipboardprint?
def sum_of_list(N):  
    total = 0  
    for i in range(5):  
        L = [j ^ (j >> i) for j in range(N)]  
        total += sum(L)  
  
    return total  

Now we can call %prun with a function call to see the profiled results:

view plaincopy to clipboardprint?
14 function calls in 0.714 seconds  
Ordered by: internal time  
ncalls tottime percall cumtime percall filename:lineno(function)  
5 0.599 0.120 0.599 0.120 19>:4()  
5 0.064 0.013 0.064 0.013 {built-in method sum}  
1 0.036 0.036 0.699 0.699 19>:1(sum_of_lists)  
1 0.014 0.014 0.714 0.714 :1()  
1 0.000 0.000 0.714 0.714 {built-in method exec}  

The result is a table that indicates, in order of total time on each function call, where the execution is spending the most time. In this case, the bulk of execution time is in the list comprehension inside sum_of_list. From here, we could start thinking about what changes we might make to improve the performance in the algorithm.

Line-by-Line Profiling with %lprun
The function-by-function profiling of %prun is useful, but sometimes it’s more convenient to have a line-by-line profile report. This is not built into Python or IPython, but there is a line_profiler package available for installation that can do this. Start by using Python’s packaging tool, pip, to install the line_profiler package:

# pip install line_profiler

Next, you can use IPython to load the line_profiler IPython extension, offered as part of this package:

In[9]: %load_ext line_profiler

Now the %lprun command will do a line-by-line profiling of any function—in this case, we need to tell it explicitly which functions we’re interested in profiling:

In[10]: %lprun -f sum_of_list sum_of_list(5000)

As before, the notebook sends the result to the pager, but it looks something like this:

The information at the top gives us the key to reading the results: the time is reported in microseconds and we can see where the program is spending the most time. At this point, we may be able to use this information to modify aspects of the script and make it perform better for our desired use case.

Profiling Memory Use: %memit and %mprun
Another aspect of profiling is the amount of memory an operation uses. This can be evaluated with another IPython extension, the memory_profiler. As with the line_profiler, we start by pip-installing the extension:

# pip install memory_profiler

Then we can use IPython to load the extension:

In[12]: %load_ext memory_profiler

The memory profiler extension contains two useful magic functions: the %memit magic (which offers a memory-measuring equivalent of %timeit) and the %mprun function (which offers a memory-measuring equivalent of %lprun). The %memit function can be used rather simply:

In[13]: %memit sum_of_list(1000000)
peak memory: 100.08 MiB, increment: 61.36 MiB

We see that this function uses about 100 MB of memory.

For a line-by-line description of memory use, we can use the %mprun magic. Unfortunately, this magic works only for functions defined in separate modules rather than the notebook itself, so we’ll start by using the %%file magic to create a simple module called mprun_demo.py, which contains our sum_of_list function, with one addition that will make our memory profiling results more clear:

We can now import the new version of this function and run the memory line profiler:

In[15]: from mprun_demo import sum_of_lists
%mprun -f sum_of_lists sum_of_lists(1000000)

The result, printed to the pager, gives us a summary of the memory use of the function, and looks something like this:

Here the Increment column tells us how much each line affects the total memory budget: observe that when we create and delete the list L, we are decreasing about 25 MB of memory usage. This is on top of the background memory usage from the Python interpreter itself.

程式扎記

標籤

2018年7月14日星期六

[ Py DS ] Ch1 - IPython: Beyond Normal Python

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

標籤

2018年7月14日 星期六