Python High Performance Programming
This is an automatically translated post by LLM. The original post is in Chinese. If you find any translation errors, please leave a comment to help me improve the translation. Thanks!
Python High Performance Programming
Performance Analysis
Time Analysis
Use decorators to automatically measure time:
1
2
3
4
5
6
7
8
9
10
11
12from functools import wraps
import time
def timefn(fn):
def measure_time(*args, **kwargs):
t1=time.time()
result=fn(*args, **kwargs)
t2=time.time()
print("@timefn:"+fn.__name__+" took "+str(t2-t1)+" seconds")
return result
return measure_timeIn IPython, you can use the
%timeit
magic function to time function runs:1
%timeit <function>
Use the UNIX
time
command to time:1
/usr/bin/time -p (--verbose) python <script-name>
Use the
cProfile
module:cProfile
is a built-in analysis tool in the standard library. It hooks into CPython's virtual machine to measure the time it takes for each function to run. Add the@profile
decorator before the function to be analyzed, and run the following command for performance analysis:1
python -m cProfile -s cumulative <script-name> <options>
You can also save the analysis results to a statistics file and use other tools for analysis:
1
python -m cProfile -o <outputfilename> <script-name> <options>
Use
line_profiler
for line-by-line analysis:Install:
pip install line_profiler
Add the
@profile
decorator before the function to be analyzed, and run the following command for performance analysis:1
kernprof -l -v <script-name>
Space Analysis
Use
memory_profiler
to analyze memory usage:Install:
pip install memory_profiler
Add the
@profile
decorator before the function to be analyzed, and run the following command for performance analysis:1
python -m memory_profiler <script-name>
Use
mprof
to plot memory usage. This can be done in two steps:Run the program using
mprof
, which will generate anmprof*.dat
file in the directory:1
mprof run <script-name>
Plot the graph:
1
mprof plot
Lists and Tuples
Data structures such as lists and tuples are called arrays, which are flat lists of data in some intrinsic order.
The main differences between lists and tuples are as follows:
- Lists are dynamic arrays that are mutable and can be resized. Tuples are static arrays that are immutable and cannot be changed once created.
- Tuples are cached in the Python runtime environment, which means that there is no need to access the kernel to allocate memory every time a tuple is used.
- Lists require more memory overhead than tuples.
Lists
The query operation for lists is O(1).
The default search operation for lists (list.index()
) is
a linear operation, with a worst-case time complexity of O(n). Sorting
the list can reduce the search time to O(log n) (using binary
search).
Python lists have a built-in sorting algorithm that uses Tim sort. Tim sort sorts with a complexity of O(n) in the best case (O(n log n) in the worst case). It mixes insertion sort and merge sort to achieve this performance (guessing which algorithm is better for a given data set using a probing method).
The bisect
module can keep the order of the list
unchanged (default from small to large) when inserting into the list.
Some documentation for bisect
is as follows[2]:
bisect.bisect_left(a, x, lo=0, hi=len(a), *, key=None)
Find the leftmost point of insertion for x in list a
bisect.bisect_right(a, x, lo=0, hi=len(a), *, key=None)
Find the rightmost point of insertion for x in list a
bisect.bisect(a, x, lo=0, hi=len(a), *, key=None)
Same as
bisect.bisect_left
bisect.insort_left(a, x, lo=0, hi=len(a), *, key=None)
Insert x into list a in sorted order, with the insertion point determined by
bisect.bisect_left
bisect.insort_right(a, x, lo=0, hi=len(a), *, key=None)
Insert x into list a in sorted order, with the insertion point determined by
bisect.bisect_right
bisect.insort(a, x, lo=0, hi=len(a), ***, key=None)
Same as
bisect.insort_right
Performance Notes
When writing time-sensitive code using
bisect()
andinsort()
, keep the following concepts in mind.
- Binary search is efficient for searching a certain range of values. For locating a specific value, dictionary performance is better.
- The time complexity of the
insort()
function is O(n) because the logarithmic search steps are dominated by the linear insertion steps.- These search functions are stateless and discard the results of the key function after they are used. Therefore, if the search function is used in a loop, the key function may be called repeatedly on the same data element. If the key function is not fast, consider wrapping it with
functools.cache()
to avoid redundant calculations. Alternatively, consider searching a precomputed key array to locate the insertion point (as demonstrated in the example section below).
When allocating memory for a list, a certain amount of reserved space is added. When a list of size N needs to add new elements, Python creates a new list that is large enough to hold the original N elements plus the additional elements. In actual allocation, the size allocated is not N+1, but N+M, where M is calculated as follows:
M = (N >> 3) + (N < 9 ? 3 : 6) + 1
N | 0 | 1-4 | 5-8 | 9-16 | 17-25 | 26-35 | ... | 991-1120 |
---|---|---|---|---|---|---|---|---|
M | 0 | 1 | 2 | 3 | 4 | 5 | ... | 140 |