【译】python性能分析与优化指南

这篇文章来自：https://uwpce-pythoncert.github.io/SystemDevelopment/profiling.html，译者进行了调整与补充。

前言：性能分析与优化

什么是性能？

"性能 "可以是以下任何一项的衡量标准。

资源使用（CPU、内存）
调用函数的频率与时间耗费

优化步骤

编写可维护/可读性好的代码
测试
收集性能统计数据
如果够快，就 ok 了。
不然就根据数据优化最耗费的部分。

优化的重点在于程序核心部分，对于非核心部分，如果考虑到调试和维护的问题，对这部分提高效率的尝试实际上会产生强烈的负面影响。

过早的优化是万恶之源。-Donald Knuth

优化方法

高效算法(大 O 等...)
合适的 Python 数据类型等
适当的 Python 规范
专用包(numpy, scipy)
调用外部包
使用 C/C++/Fortran/Cython 等。

什么是大 O？

一个算法的效率通常用 "O"符号来描述。之所以使用字母 O，是因为函数的增长速度也被称为函数的阶数，它描述了一个算法所使用的资源与输入数据量的函数关系。

O(1) - (性能恒定) 无论提供多少数据，执行时间都保持不变。如：字典新增一个键值对。
O(n) - 执行时间随输入数据的增加而线性上升。如：遍历列表。
$O(n^2)$ - 执行时间随输入数据的增加而呈二次方上升。如：最糟糕情况下的冒泡排序。
O(log(n)) - 随着输入数据的对数增加而增加。如：二等分搜索。

统计函数执行时间

最简单的方法当然是使用 python 内置的计时器，当一个单位的代码（如函数）开始时开始计时，当代码返回时停止计时。

像大多数计时基准一样，获得的数据只对特定的测试环境（机器/操作系统/python 版本）有效。例如，在网络快、磁盘慢的机器上运行，在网络慢、磁盘快的系统上运行，结果可能大不相同。

time.clock() / time.time()

将时间模块用作装饰器。

time.time() 返回 unix 系统时间（挂钟时间）。
time.clock()返回当前进程的 CPU 时间。

很简单，但是可以直观地反映情况。

例子：

import time

def timer(func):
    def timer(*args, **kwargs):
        """a decorator which prints execution time of the decorated function"""
        t1 = time.time()
        result = func(*args, **kwargs)
        t2 = time.time()
        print("-- executed %s in %.4f seconds" % (func.func_name, (t2 - t1)))
        return result
    return timer

@timer
def expensive_function():
    time.sleep(1)

@timer
def less_expensive_function():
    time.sleep(.02)

expensive_function()
less_expensive_function()

timeit

用于测试小块代码
多次运行给定的语句并计算平均执行时间。
可以从命令行运行。

从命令行运行：

python -m timeit '"-".join(str(n) for n in range(100))'

参考文档：https://docs.python.org/3.5/library/timeit.html

可选参数：

-nN：在一个循环中执行给定语句 N 次。如果没有给出这个值，则选择一个合适的值。
-rR：重复循环迭代 R 次，取最佳结果。默认值：3
-t：使用 time.time 来测量时间，这是 Unix 上的默认值。该函数测量时钟时间。
-c：使用 time.clock 来测量时间，这是 Windows 上的默认值，测量时钟时间。在 Unix 上，使用 resource.getrusage 代替，返回 CPU 用户时间。
-pP：使用精度为 P 的数字来显示计时结果。默认值：3

如：

$ python -m timeit -n 1000 -t "len([x**2 for x in range(1000)])"

timeit 同样可以作为一个模块引入：

参考文档：http://docs.python.org/3/library/timeit.html#timeit.timeit

import timeit
statement = "char in text"
setup_code = """'food'.find('f') >= 0"""
timeit.timeit(statement, setup=setup_code, number=1000)

Setup 参数即要测试的代码语句。

timeit 同样包括在 ipython 的魔法命令中。

%timeit pass

u = None
%timeit u is None

%timeit -r 4 u == None

import time
%timeit -n1 time.sleep(2)

%timeit -n 10000 "f" in "food"

参考文档 http://ipython.readthedocs.io/en/stable/interactive/magics.html?#magic-timeit

获取更加详细的信息

只有当你知道你的代码慢在哪里的时候，上一节的代码才是有用的。如果你只知道你的代码很慢，却不知道到底哪里出问题了，怎么办？一部分一部分试吗？当然不，这里我们就要用到 profiler（分析器）了。

profiler 会对运行时的性能进行测量，并将结果汇总成分析报告。

报告的指标可包括：

随着时间推移使用的内存
每个函数分配的内存
函数调用频率
函数调用时间
子函数调用的累计时间

python 内置的分析器

Python 自带了几个 profiler 模块

profile - 纯 Python 编写。如果你需要扩展 profiler，可以使用它。否则不要用，因为它比较慢。
cProfile - 与 profile 的 API 相同，但用 C 语言编写，开销较小。

可以直接在命令行中使用：

python -m cProfile [-o output_file] [-s sort_order] read_bna.py # 调用命令

11111128 function calls in 8.283 seconds
Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1    0.000    0.000    0.000    0.000   integrate.py:1()
11111110    2.879    0.000    2.879    0.000   integrate.py:1(f)
[....]

ncalls：调用次数
tottime：调用该函数中花费的总时间，不包括调用子函数的时间。
percall：tottime / ncalls，每次调用的耗时
cumtime：在函数中花费的总时间，包括在子函数中的时间。
percall： cumtime / ncalls
filename:lineno - 函数的位置。

分析统计

用 -o < 文件名 > 参数输出为二进制文件。在进行性能分析时，请保存您的分析文件，以便和优化后的程序进行比较。

分析文件可以通过 pstats 打开

python -m pstats

详细用法：

python -m cProfile -o prof_dump  ./read_bna.py
python -m pstats
% read prof_dump

# show stats:
prof_dump% stats

# only the top 5 results:
prof_dump% stats 5

# sort by cumulative time:
prof_dump% sort cumulative

# shorten long filenames for display:
prof_dump% strip
# show results again:
prof_dump% stats 5

pstats 当然也可以作为方法调用：

import pstats
p = pstats.Stats('prof_dump')
p.sort_stats('calls', 'cumulative')
p.print_stats()

# Output can be restricted via arguments to print_stats().
# Each restriction is either an integer (to select a count of lines),
# a decimal fraction between 0.0 and 1.0 inclusive (to select a percentage of lines),
# or a regular expression (to pattern match the standard name that is printed.
# If several restrictions are provided, then they are applied sequentially.

装饰器

和上一节一样，我们同样可以将 cProfile 编写为装饰器。下面代码参考了：https://juejin.cn/post/6844903760414654478

def func_cprofile(f):
    """
    内建分析器
    """

    @wraps(f)
    def wrapper(*args, **kwargs):
        profile = cProfile.Profile()
        try:
            profile.enable()
            result = f(*args, **kwargs)
            profile.disable()
            return result
        finally:
            profile.print_stats(sort='time')

    return wrapper

使用：

@func_cprofile
def test(): 
    for x in range(10000000):
        print(x)

其他工具

SNAKEVIZ

一个分析文件的可视化展示工具。https://jiffyclub.github.io/snakeviz/

安装：

pip install snakeviz

使用：

snakeviz program.prof

line profiler

上面介绍的方法大多是以函数为单位进行分析，line_profiler 则是一个用于对函数进行逐行剖析的模块。使用 -l 启用逐行分析。

用 @profile 来装饰你要分析的函数，然后运行。

# the -v option will display the profile data immediately, instead
# of just writing it to <filename.py>.lprof
$ kernprof -l -v integrate_main.py

# load the output with
$ python -m line_profiler integrate_main.py.lprof

https://github.com/rkern/line_profiler

使用内存分析

一个选择是 heapy，它自带 Guppy，一个用于 Python 环境下内存分析的库。

from guppy import hpy; hp=hpy()
hp.doc.heap
hp.heap()
%run define.py Robot
hp.heap()

其他：

https://pypi.python.org/pypi/memory_profiler

http://mg.pov.lt/objgraph/

https://launchpad.net/meliae

http://pythonhosted.org/Pympler/muppy.html

http://jmdana.github.io/memprof/

提升 python 性能

有一些方法可以更好地构建你的 Python 代码以提高性能。

几个关键方法

避免重复调用函数，尽可能用内置函数替代自己写的函数。

import time
x = 0
def doit1(i):
    global x
    x = x + i

list = range(100000)
t = time.time()
for i in list:
    doit1(i)

print("%.3f" % (time.time()-t))

import time
x = 0
def doit2(list):
    global x
    for i in list:
        x = x + i

list = range(100000)
t = time.time()
doit2(list)
print("%.3f" % (time.time()-t))

第二种写法要比第一种快。

字符串处理：使用 "".join(list_of_strings) 而不是连续调用 +=
使用列表推导式、列表生成器或 map() 代替 for 循环可以更快
利用 C 扩展库，例如 Numpy 来进行快速数组操作。
将耗时较长的代码重写为 C 模块。使用 ctypes、Cython、SWIG...等。

内存管理

有时候，推送内存的时间可能比进行计算的时间要长。所以对于大数据集要牢记：

使用正确的数据结构
使用高效算法
使用生成器和迭代器，而不是列表。
使用迭代器从数据库、sockets、文件、......中提取你需要的数据。

版权属于：作者名称
本文链接：https://www.sitstars.com/archives/107/
转载时须注明出处及本声明

页面

分类

【译】python性能分析与优化指南