Another blog post in which I use sys.settrace. This time to solve a real problem.
When working with new modules, it is sometimes beneficial to get a glimpse of which entities of a module are actually used. I wrote something comparable in my blog post Instrumenting Java Code to Find and Handle Unused Classes, but this time, I need it in Python and with method-level granularity.
TL;DR
Download trace.py from GitHub and use it to print a call tree and a list of used methods and classes to the error output:
import trace trace.setup(r"MODULE_REGEX", print_location_=True)
Implementation
This could be a hard problem, but it isn’t when we’re using sys.settrace
to set a handler for every method and function call, reapplying the knowledge we gained in my Let’s create a debugger together series to develop a small utility.
There are essentially six different types of functions (this sample code is on GitHub):
def log(message: str): print(message) class TestClass: # static initializer of the class x = 100 def __init__(self): # constructor log("instance initializer") def instance_method(self): # instance method, self is bound to an instance log("instance method") @staticmethod def static_method(): log("static method") @classmethod def class_method(cls): log("class method") def free_function(): log("free function")
This is important because we have to handle them differently in the following. But first, let’s define a few helpers and configuration variables:
indent = 0 module_matcher: str = ".*" print_location: bool = False
We also want to print a method call-tree, so we use indent
to track the current indentation level. The module_matcher
is the regular expression that we use to determine whether we want to consider a module, its classes, and methods. This could, e.g., be __main__
to only consider the main module. The print_location
tells us whether we want to print the path and line location for every element in the call tree.
Now to the main helper class:
def log(message: str): print(message, file=sys.stderr) STATIC_INIT = "<static init>" @dataclass class ClassInfo: """ Used methods of a class """ name: str used_methods: Set[str] = field(default_factory=set) def print(self, indent_: str): log(indent_ + self.name) for method in sorted(self.used_methods): log(indent_ + " " + method) def has_only_static_init(self) -> bool: return ( len(self.used_methods) == 1 and self.used_methods.pop() == STATIC_INIT) used_classes: Dict[str, ClassInfo] = {} free_functions: Set[str] = set()
The ClassInfo
stores the used methods of a class. We store the ClassInfo
instances of used classes and the free function in global variables.
Now to the our call handler that we pass to sys.settrace
:
def handler(frame: FrameType, event: str, *args): """ Trace handler that prints and tracks called functions """ # find module name module_name: str = mod.__name__ if ( mod := inspect.getmodule(frame.f_code)) else "" # get name of the code object func_name = frame.f_code.co_name # check that the module matches the define regexp if not re.match(module_matcher, module_name): return # keep indent in sync # this is the only reason why we need # the return events and use an inner trace handler global indent if event == 'return': indent -= 2 return if event != "call": return # insert the current function/method name = insert_class_or_function(module_name, func_name, frame) # print the current location if neccessary if print_location: do_print_location(frame) # print the current function/method log(" " * indent + name) # keep the indent in sync indent += 2 # return this as the inner handler to get # return events return handler def setup(module_matcher_: str = ".*", print_location_: bool = False): # ... sys.settrace(handler)
Now, we “only” have to get the name for the code object and collect it properly in either a ClassInfo
instance or the set of free functions. The base case is easy: When the current frame contains a local variable self
, we probably have an instance method, and when it contains a cls
variable, we have a class method.
def insert_class_or_function(module_name: str, func_name: str, frame: FrameType) -> str: """ Insert the code object and return the name to print """ if "self" in frame.f_locals or "cls" in frame.f_locals: return insert_class_or_instance_function(module_name, func_name, frame) # ... def insert_class_or_instance_function(module_name: str, func_name: str, frame: FrameType) -> str: """ Insert the code object of an instance or class function and return the name to print """ class_name = "" if "self" in frame.f_locals: # instance methods class_name = frame.f_locals["self"].__class__.__name__ elif "cls" in frame.f_locals: # class method class_name = frame.f_locals["cls"].__name__ # we prefix the class method name with "<class>" func_name = "<class>" + func_name # add the module name to class name class_name = module_name + "." + class_name get_class_info(class_name).used_methods.add(func_name) used_classes[class_name].used_methods.add(func_name) # return the string to print in the class tree return class_name + "." + func_name
But how about the other three cases? We use the header line of a method to distinguish between them:
class StaticFunctionType(Enum): INIT = 1 """ static init """ STATIC = 2 """ static function """ FREE = 3 """ free function, not related to a class """ def get_static_type(code: CodeType) -> StaticFunctionType: file_lines = Path(code.co_filename).read_text().split("\n") line = code.co_firstlineno header_line = file_lines[line - 1] if "class " in header_line: # e.g. "class TestClass" return StaticFunctionType.INIT if "@staticmethod" in header_line: return StaticFunctionType.STATIC return StaticFunctionType.FREE
These are, of course, just approximations, but they work well enough for a small utility used for exploration.
If you know any other way that doesn’t involve using the Python AST, feel free to post in a comment below.
Using the get_static_type
function, we can now finish the insert_class_or_function
function:
def insert_class_or_function(module_name: str, func_name: str, frame: FrameType) -> str: """ Insert the code object and return the name to print """ if "self" in frame.f_locals or "cls" in frame.f_locals: return insert_class_or_instance_function(module_name, func_name, frame) # get the type of the current code object t = get_static_type(frame.f_code) if t == StaticFunctionType.INIT: # static initializer, the top level class code # func_name is actually the class name here, # but classes are technically also callable function # objects class_name = module_name + "." + func_name get_class_info(class_name).used_methods.add(STATIC_INIT) return class_name + "." + STATIC_INIT elif t == StaticFunctionType.STATIC: # @staticmethod # the qualname is in our example TestClass.static_method, # so we have to drop the last part of the name to get # the class name class_name = module_name + "." + frame.f_code.co_qualname[ :-len(func_name) - 1] # we prefix static class names with "<static>" func_name = "<static>" + func_name get_class_info(class_name).used_methods.add(func_name) return class_name + "." + func_name free_functions.add(frame.f_code.co_name) return module_name + "." + func_name
The final thing left to do is to register a teardown handler to print the collected information on exit:
def teardown(): """ Teardown the tracer and print the results """ sys.settrace(None) log("********** Trace Results **********") print_info() # trigger teardown on exit atexit.register(teardown)
Usage
We now prefix our sample program from the beginning with
import trace trace.setup(r"__main__")
collect all information for the __main__
module, which is directly passed to the Python interpreter.
We append to our program some code to call all methods/functions:
def all_methods(): log("all methods") TestClass().instance_method() TestClass.static_method() TestClass.class_method() free_function() all_methods()
Our utility library then prints the following upon execution:
standard error: __main__.TestClass.<static init> __main__.all_methods __main__.log __main__.TestClass.__init__ __main__.log __main__.TestClass.instance_method __main__.log __main__.TestClass.<static>static_method __main__.log __main__.TestClass.<class>class_method __main__.log __main__.free_function __main__.log ********** Trace Results ********** Used classes: only static init: not only static init: __main__.TestClass <class>class_method <static init> <static>static_method __init__ instance_method Free functions: all_methods free_function log standard output: all methods instance initializer instance method static method class method free function
Conclusion
This small utility uses the power of sys.settrace
(and some string processing) to find a module’s used classes, methods, and functions and the call tree. The utility is pretty helpful when trying to grasp the inner structure of a module and the module entities used transitively by your own application code.
I published this code under the MIT license on GitHub, so feel free to improve, extend, and modify it. Come back in a few weeks to see why I actually developed this utility…
This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone.