Another blog post in which I use sys.settrace. This time to solve a real problem.
When working with new modules, it is sometimes beneficial to get a glimpse of which entities of a module are actually used. I wrote something comparable in my blog post Instrumenting Java Code to Find and Handle Unused Classes, but this time, I need it in Python and with method-level granularity.
TL;DR
Download trace.py from GitHub and use it to print a call tree and a list of used methods and classes to the error output:
import trace
trace.setup(r"MODULE_REGEX", print_location_=True)
Implementation
This could be a hard problem, but it isn’t when we’re using sys.settrace
to set a handler for every method and function call, reapplying the knowledge we gained in my Let’s create a debugger together series to develop a small utility.
There are essentially six different types of functions (this sample code is on GitHub):
def log(message: str):
print(message)
class TestClass:
# static initializer of the class
x = 100
def __init__(self):
# constructor
log("instance initializer")
def instance_method(self):
# instance method, self is bound to an instance
log("instance method")
@staticmethod
def static_method():
log("static method")
@classmethod
def class_method(cls):
log("class method")
def free_function():
log("free function")
This is important because we have to handle them differently in the following. But first, let’s define a few helpers and configuration variables:
indent = 0
module_matcher: str = ".*"
print_location: bool = False
We also want to print a method call-tree, so we use indent
to track the current indentation level. The module_matcher
is the regular expression that we use to determine whether we want to consider a module, its classes, and methods. This could, e.g., be __main__
to only consider the main module. The print_location
tells us whether we want to print the path and line location for every element in the call tree.
Now to the main helper class:
def log(message: str):
print(message, file=sys.stderr)
STATIC_INIT = "<static init>"
@dataclass
class ClassInfo:
""" Used methods of a class """
name: str
used_methods: Set[str] = field(default_factory=set)
def print(self, indent_: str):
log(indent_ + self.name)
for method in sorted(self.used_methods):
log(indent_ + " " + method)
def has_only_static_init(self) -> bool:
return (
len(self.used_methods) == 1 and
self.used_methods.pop() == STATIC_INIT)
used_classes: Dict[str, ClassInfo] = {}
free_functions: Set[str] = set()
The ClassInfo
stores the used methods of a class. We store the ClassInfo
instances of used classes and the free function in global variables.
Now to the our call handler that we pass to sys.settrace
:
def handler(frame: FrameType, event: str, *args):
""" Trace handler that prints and tracks called functions """
# find module name
module_name: str = mod.__name__ if (
mod := inspect.getmodule(frame.f_code)) else ""
# get name of the code object
func_name = frame.f_code.co_name
# check that the module matches the define regexp
if not re.match(module_matcher, module_name):
return
# keep indent in sync
# this is the only reason why we need
# the return events and use an inner trace handler
global indent
if event == 'return':
indent -= 2
return
if event != "call":
return
# insert the current function/method
name = insert_class_or_function(module_name, func_name, frame)
# print the current location if neccessary
if print_location:
do_print_location(frame)
# print the current function/method
log(" " * indent + name)
# keep the indent in sync
indent += 2
# return this as the inner handler to get
# return events
return handler
def setup(module_matcher_: str = ".*", print_location_: bool = False):
# ...
sys.settrace(handler)
Now, we “only” have to get the name for the code object and collect it properly in either a ClassInfo
instance or the set of free functions. The base case is easy: When the current frame contains a local variable self
, we probably have an instance method, and when it contains a cls
variable, we have a class method.
def insert_class_or_function(module_name: str, func_name: str,
frame: FrameType) -> str:
""" Insert the code object and return the name to print """
if "self" in frame.f_locals or "cls" in frame.f_locals:
return insert_class_or_instance_function(module_name,
func_name, frame)
# ...
def insert_class_or_instance_function(module_name: str,
func_name: str,
frame: FrameType) -> str:
"""
Insert the code object of an instance or class function and
return the name to print
"""
class_name = ""
if "self" in frame.f_locals:
# instance methods
class_name = frame.f_locals["self"].__class__.__name__
elif "cls" in frame.f_locals:
# class method
class_name = frame.f_locals["cls"].__name__
# we prefix the class method name with "<class>"
func_name = "<class>" + func_name
# add the module name to class name
class_name = module_name + "." + class_name
get_class_info(class_name).used_methods.add(func_name)
used_classes[class_name].used_methods.add(func_name)
# return the string to print in the class tree
return class_name + "." + func_name
But how about the other three cases? We use the header line of a method to distinguish between them:
class StaticFunctionType(Enum):
INIT = 1
""" static init """
STATIC = 2
""" static function """
FREE = 3
""" free function, not related to a class """
def get_static_type(code: CodeType) -> StaticFunctionType:
file_lines = Path(code.co_filename).read_text().split("\n")
line = code.co_firstlineno
header_line = file_lines[line - 1]
if "class " in header_line:
# e.g. "class TestClass"
return StaticFunctionType.INIT
if "@staticmethod" in header_line:
return StaticFunctionType.STATIC
return StaticFunctionType.FREE
These are, of course, just approximations, but they work well enough for a small utility used for exploration.
If you know any other way that doesn’t involve using the Python AST, feel free to post in a comment below.
Using the get_static_type
function, we can now finish the insert_class_or_function
function:
def insert_class_or_function(module_name: str, func_name: str,
frame: FrameType) -> str:
""" Insert the code object and return the name to print """
if "self" in frame.f_locals or "cls" in frame.f_locals:
return insert_class_or_instance_function(module_name,
func_name, frame)
# get the type of the current code object
t = get_static_type(frame.f_code)
if t == StaticFunctionType.INIT:
# static initializer, the top level class code
# func_name is actually the class name here,
# but classes are technically also callable function
# objects
class_name = module_name + "." + func_name
get_class_info(class_name).used_methods.add(STATIC_INIT)
return class_name + "." + STATIC_INIT
elif t == StaticFunctionType.STATIC:
# @staticmethod
# the qualname is in our example TestClass.static_method,
# so we have to drop the last part of the name to get
# the class name
class_name = module_name + "." + frame.f_code.co_qualname[
:-len(func_name) - 1]
# we prefix static class names with "<static>"
func_name = "<static>" + func_name
get_class_info(class_name).used_methods.add(func_name)
return class_name + "." + func_name
free_functions.add(frame.f_code.co_name)
return module_name + "." + func_name
The final thing left to do is to register a teardown handler to print the collected information on exit:
def teardown():
""" Teardown the tracer and print the results """
sys.settrace(None)
log("********** Trace Results **********")
print_info()
# trigger teardown on exit
atexit.register(teardown)
Usage
We now prefix our sample program from the beginning with
import trace
trace.setup(r"__main__")
collect all information for the __main__
module, which is directly passed to the Python interpreter.
We append to our program some code to call all methods/functions:
def all_methods():
log("all methods")
TestClass().instance_method()
TestClass.static_method()
TestClass.class_method()
free_function()
all_methods()
Our utility library then prints the following upon execution:
standard error:
__main__.TestClass.<static init>
__main__.all_methods
__main__.log
__main__.TestClass.__init__
__main__.log
__main__.TestClass.instance_method
__main__.log
__main__.TestClass.<static>static_method
__main__.log
__main__.TestClass.<class>class_method
__main__.log
__main__.free_function
__main__.log
********** Trace Results **********
Used classes:
only static init:
not only static init:
__main__.TestClass
<class>class_method
<static init>
<static>static_method
__init__
instance_method
Free functions:
all_methods
free_function
log
standard output:
all methods
instance initializer
instance method
static method
class method
free function
Conclusion
This small utility uses the power of sys.settrace
(and some string processing) to find a module’s used classes, methods, and functions and the call tree. The utility is pretty helpful when trying to grasp the inner structure of a module and the module entities used transitively by your own application code.
I published this code under the MIT license on GitHub, so feel free to improve, extend, and modify it. Come back in a few weeks to see why I actually developed this utility…
This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone.