Refactoring
Bowler builds on top of the Concrete Syntax Tree (CST) provided by lib2to3 in the standard library. This enables direct modifications to the tree, without losing formatting, comments, or whitespace, and without needing to directly support every new version of Python as they get released.
Fixers
Fixers are a core concept of refactoring in lib2to3, and they contain both
a grammar-based search "pattern" for finding nodes in the CST as well as methods for
transforming nodes that match the pattern. They are relatively self-contained, and
form the core component of the 2to3
tool used for migrating older Python structures
to modern Python 3 analogues.
Bowler wraps much of its framework around generating a set of fixers, and then using a modified version of the lib2to3 refactoring framework to apply these fixers to the target Python source files. This allows Bowler to reuse and benefit from most of the features in lib2to3 while still allowing it to provide custom behaviors and functionality not found in the standard library.
Queries
Bowler operates by building "queries" against the CST. Queries consist of one or more "transforms", which specify what elements of the syntax tree to modify, as well as how to modify those elements. Each transform represents a search against the syntax tree, along with a programmatic modification to any matched tree elements. These transforms generate a fixer for lib2to3, and Bowler will execute multiple transforms simultaneously on each file processed. By combining multiple transforms together, Bowler enables a wide range of refactoring, from simple renames to complicated upgrades from old APIs to their modern replacements.
The primary mechanism for building queries and transforms in Bowler is by using a "fluent" API:
A fluent interface is a method for designing object oriented APIs based extensively on method chaining with the goal of making the readability of the source code close to that of ordinary written prose, essentially creating a domain-specific language within the interface. -- Fluent Interface
This allows you to chain method calls to the Query
class one after the other, without
needing to assign a query object to a variable or reference that variable for every
method call. In practice, that looks something like this:
(
Query()
.select(...)
.modify(...)
.execute()
)
Transforms
Transforms are implicitly created by the Query
API while defining the desired query.
Each transform begins with a "selector" (the syntax tree pattern to search for), or an
existing fixer class that already contains a selector pattern. Then, any number of
"filters" are specified, followed by one or more "modifiers". Each time a selector or
fixer is added to the query, a new transform is generated, and subsequent filters or
modifiers are added to the transform.
Repeating this pattern of selector, filters, and modifiers will generate further transforms, each of which will be executed as elements in the syntax tree match the selector patterns for a given transform.
Selectors
These represents lib2to3 search patterns to select syntax tree nodes matching the given pattern, while capturing relevant child nodes or leaves when available. Bowler includes a set of common selectors that attempt to find all definitions or references of a specific type, and uses a consistent naming scheme for all captured elements.
An example selector to find all uses of the print
function might look like:
pattern = """
power< "print"
trailer< "(" print_args=any* ")" >
>
"""
(
Query()
.select(pattern)
...
)
This example finds all function calls named print
, followed immediately by the
parenthetical trailer, and captures everything contained in those parentheses using the
name print_args
.
Filters
Once elements are found matching the active selector, filters can further limit which
elements are passed to the modifiers by inspecting the original matches and returning
True
if it still matches, or False
if the element should be excluded. Elements
will only be considered as matches if they successfully pass all filter functions.
If an element fails one filter function, it will not be passed or considered for any
other filter functions.
An example filter, to pair with the example selector above, that only matches calls to the print function with string literal arguments:
def print_string(node: LN, capture: Capture, filename: Filename) -> bool:
args = capture.get("print_args")
# args will be a list because we captured `any*`
if args:
return len(args) == 1 and args[0].type == TOKEN.STRING:
return False
(
Query()
...
.filter(print_string)
...
)
This filter function will only return True
if the matched print call contained
exactly one argument, and that argument's type was a string literal.
Modifiers
After matched elements have been filtered, all remaining matches will be passed to all modifier functions in sequence. The modifier function may then transform the syntax tree at any point from the root to the leaves, without being restricted to the branch corresponding to the matched element. Transformations can include modifying existing nodes or leaves, removing or replacing elements, or inserting elements into the tree.
An example modifier, which builds the above selector and filter examples, transforms
the string literal arguments of the print function to wrap them in a hypothetical
translation function (tr
):
def translate_string(node: LN, capture: Capture, filename: Filename) -> None:
args = capture.get("print_args")
# make sure the arguments haven't already been modified
if args and args[0].type == TOKEN.STRING:
args[0].replace(
Call(
Name("tr"),
args=[args[0].clone()],
)
)
(
Query()
...
.modify(translate_string)
...
)
The modifier function replaces the existing string literal element with the nested
nodes to represent a call to the hypothetical translation function (tr
), with a clone
of the existing string literal as the only argument.
Note: take care to ensure that matched and captured elements still meet the expectations of your modifier function. They may have already been transformed (or entirely removed or replaced) by previous modifiers.
Processors
Bowler recognizes that there are cases where it's useful to post-process final changes to the codebase, and provides a mechanism for attaching "processors" to the query. These are functions that will be executed for each individual "hunk", or change, to a file – the same "hunks" that you would see in a unified diff format – and optionally decide whether that hunk should be applied or skipped. This allows for behavior such as extra logging, selective application of hunks based on conditions, record keeping for future queries, and much more.
An example processor that keeps track of every file touched by the query:
MODIFIED: Set[Filename] = set()
def modified_files(filename: Filename, hunk: Hunk) -> bool:
MODIFIED.add(filename)
return True
(
Query()
...
.process(modified_files)
...
)
Execution
After building a query with some combination of selectors, filters, modifiers, and/or
processors, there are a few different execution commands that Bowler takes to determine
how it will apply the query to the codebase. The default behavior when calling
.execute()
on the query is to generate an interactive diff – similar to that provided
by git add -p
– and show each hunk in series, asking the user to apply or skip that
hunk. This is the "safest" option, as the user can verify that each change is intended
and meets the expectations of the refactoring. There are also options to just generate
a diff without applying hunks, or to apply all hunks without asking, and there are
shortcut methods for each of these options as well.
(
Query()
...
.execute(
interactive = True, # ask about each hunk
write = False, # automatically apply each hunk
silent = False, # don't ask or print hunks at all
)
)