Sourceror (Sourceror v0.11.0) View Source

Utilities to work with Elixir source code.

NOTICE: This library is under heavy development. Expect frequent breaking changes until the first stable v1.0 release is out.

Installation

Add :sourceror as a dependency to your project's mix.exs:

defp deps do
  [
    {:sourceror, "~> 0.11"}
  ]
end

A note on compatibility

Sourceror is compatible with Elixir versions down to 1.10 and OTP 21. For Elixir versions prior to 1.13 it uses a vendored version of the Elixir parser and formatter modules. This means that for Elixir versions prior to 1.12 it will successfully parse the new syntax for stepped ranges instead of raising a SyntaxError, but everything else should work as expected.

Goals of the library

Be as close as possible to the standard Elixir AST.
Make working with comments as simple as possible.
No dev/prod dependencies, to simplify integration with other tools.

Sourceror's AST

Having the AST and comments as separate entities allows Elixir to expose the code formatting utilities without making any changes to it's AST, but also delegates the task of figuring out what's the most appropriate way to work with them to us.

Sourceror's take is to use the node metadata to store the comments. This allows us to work with an AST that is as close to regular elixir AST as possible. It also allows you to move nodes around without worrying about leaving a comment behind and ending up with misplaced comments.

Two metadata fields are added to the regular Elixir AST:

:leading_comments - holds the comments directly above the node or are in the same line as it. For example:

test "parses leading comments" do
  quoted = """
  # Comment for :a
  :a # Also a comment for :a
  """ |> Sourceror.parse_string!()
  assert {:__block__, meta, [:a]} = quoted
  assert meta[:leading_comments] == [
    %{line: 1, column: 1, previous_eol_count: 1, next_eol_count: 1, text: "# Comment for :a"},
    %{line: 2, column: 4, previous_eol_count: 0, next_eol_count: 1, text: "# Also a comment for :a"},
  ]
end

:trailing_comments - holds the comments that are inside of the node, but aren't leading any children, for example:

test "parses trailing comments" do
  quoted = """
  def foo() do
  :ok
  # A trailing comment
  end # Not a trailing comment for :foo
  """ |> Sourceror.parse_string!()
  assert {:__block__, block_meta, [{:def, meta, _}]} = quoted
  assert [%{line: 3, text: "# A trailing comment"}] = meta[:trailing_comments]
  assert [%{line: 4, text: "# Not a trailing comment for :foo"}] = block_meta[:trailing_comments]
end

Note that Sourceror considers leading comments to the ones that are found in the same line as a node, and trailing comments to the ones that are found before the ending line of a node, based on the end, closing or end_of_expression line. This also makes the Sourceror AST consistent with the way the Elixir formatter works, making it easier to reason about how a given AST would be formatted.

Traversing the AST

Elixir provides the Macro.prewalk, Macro.postwalk and Macro.traverse functions to traverse the AST. You can use the same functions to traverse the Sourceror AST as well, since it has the same shape as the standard Elixir AST.

Sourceror also provides the Sourceror.prewalk, Sourceror.postwalk and Sourceror.traverse variants. At the time of writing they are mostly wrappers around the standard Elixir functions for AST traversal, but they may be enhanced in the future if more AST formats are introduced.

In addition to these, Sourceror also provides a Zipper implementation for the Elixir AST. You can learn more about it in the Zippers notebook.

Patching the source code

You can use Sourceror to manipulate the AST and turn it back into human readable Elixir code, this is commonly known as writing a "codemod". For example, you can write a codemod to replace calls to String.to_atom to String.to_existing_atom:

test "updates the source code" do
  source =
    """
    String.to_atom(foo)\
    """

  new_source =
    source
    |> Sourceror.parse_string!()
    |> Macro.postwalk(fn
      {{:., dot_meta, [{:__aliases__, alias_meta, [:String]}, :to_atom]}, call_meta, args} ->
        {{:., dot_meta, [{:__aliases__, alias_meta, [:String]}, :to_existing_atom]}, call_meta, args}

      quoted ->
        quoted
    end)
    |> Sourceror.to_string()

  assert new_source ==
    """
    String.to_existing_atom(foo)\
    """
end

However, this will affect the whole source code, as we are working on the full source AST. Sourceror relies on the Elixir formatter to produce human readable code, so the original code formatting will be lost by using Sourceror.to_string. If your code is already using the Elixir formatter then this won't be an issue, but it will be an undesirable effect if you're not using it.

An alternative to this is to use Patches instead. A patch is a data structure that specifies the text range that should be replaced, and either a replacement string or a function that takes the text in that range and produces a string replacement.

Using patches, we could do the same as above, but produce a patch instead of modifying the AST. As a result, only the parts that need to be changed will be affected, and the rest of the code keeps the original formatting:

test "patches the source code" do
  source =
    """
    case foo do
      nil ->         :bar
      _ ->

          String.to_atom(foo)

          end\
    """

  {_quoted, patches} =
    source
    |> Sourceror.parse_string!()
    |> Macro.postwalk([], fn
      {{:., dot_meta, [{:__aliases__, alias_meta, [:String]}, :to_atom]}, call_meta, args} = quoted, patches ->
        range = Sourceror.get_range(quoted)
        replacement =
          {{:., dot_meta, [{:__aliases__, alias_meta, [:String]}, :to_existing_atom]}, call_meta, args}
          |> Sourceror.to_string()

        patch = %{range: range, change: replacement}
        {quoted, [patch | patches]}

      quoted, patches ->
        {quoted, patches}
    end)

  assert Sourceror.patch_string(source, patches) ==
    """
    case foo do
      nil ->         :bar
      _ ->

          String.to_existing_atom(foo)

          end\
    """
end

You have to keep in mind that:

If you patch a node that has inner code, like replacing a full case, then the contents of the node will be reformatted as well.
At the moment, Sourceror won't check for conflicts in the patches ranges, so care needs to be taken to not produce conflicting patches. You may need to do a number of parse -> patch -> reparse if you find yourself generating conflicting patches.

Some of the most common patching operations are available in the Sourceror.Patch module

Background

There have been several attempts at source code manipulation in the Elixir community. Thanks to its metaprogramming features, Elixir provides builtin tools that let us get the AST of any Elixir code, but when it comes to turning the AST back to code as text, we had limited options. Macro.to_string/2 is a thing, but the produced code is generally ugly, mostly because of the extra parenthesis or because it turns string interpolations into calls to erlang modules, to name some examples. This meant that, even if we could use Macro.to_string/2 to get a string and then give that to the Elixir formatter Code.format_string!/2, the output would still be suboptimal, as the formatter is not designed to change the semantics of the code, only to pretty print it. For example, call to erlang modules would be kept as is instead of being turned back to interpolations.

We also had the additional problem of comments being discarded by the tokenizer, and literals not having information like line numbers or delimiter characters. This makes the regular AST too lossy to be useful if what we want is to manipulate the source code, because we need as much information as possible to be able to stay as close to the source as possible. There have been several proposal in the past to bring all this information to the Elixir AST, but they all meant a change that would either break macros due to the addition of new types of AST nodes, or making a compromise in core Elixir itself by storing comments in the nods metadata. This discussion in the Elixir mailing list highlights the various issues faced when deciding if and how the comments would be preserved. Arjan Scherpenisse also did a talk where he discusses about the problems of using the standard Elixir AST to build refactoring tools.

Despite of all these issues, the Elixir formatter is still capable of manipulating the source code to pretty print it. Under the hood it does some neat tricks to have all this information available: on one hand, it tells the tokenizer to extract the comments from the source code and keep it at hand(not in the AST itself, but as a separate data structure), and on the other hand it tells the parser to wrap literals in block nodes so metadata can be preserved. Once it has all it needs, it can start converting the AST and comments into an algebra document, and ultimately convert that to a string. This functionality was private, and if we wanted to do it ourselves we would have to replicate or vendor the Elixir formatter with all its more than 2000 lines of code. This approach was explored by Wojtek Mach in wojtekmach/fix, but it involved vendoring the elixir Formatter code, was tightly coupled to the formatting process, and any change in Elixir would break the code.

Since Elixir 1.13 this functionality from the formatter was finally exposed via the Code.string_to_quoted_with_comments/2 and Code.quoted_to_algebra/2 functions. The former gives us access to the list of comments in a shape the Elixir formatter is able to use, and the latter lets us turn any arbitrary Elixir AST into an algebra document. If we also give it the list of comments, it will merge them together, allowing us to format AST and preserve the comments. Now all we need to care about is of manipulating the AST, and let the formatter do the rest.

Link to this section Summary

Types

comment()

patch()

position()

range()

traversal_function()

Functions

append_comments(quoted, comments, position \\ :leading)

Appends comments to the leading or trailing comments of a node.

compare_positions(left, right)

Compares two positions.

correct_lines(meta, line_correction, opts \\ [])

Shifts the line numbers of the node or metadata by the given line_correction.

get_args(arg)

Returns the arguments of the node.

get_column(arg, default \\ 1)

Returns the column of a node. If none is found, the default value is returned(defaults to 1).

get_end_line(quoted, default \\ 1)

Returns the line where the given node ends. It recursively checks for end, closing and end_of_expression line numbers. If none is found, the default value is returned(defaults to 1).

get_end_position(quoted, default \\ [line: 1, column: 1])

Returns the end position of the quoted expression. It recursively checks for end, closing and end_of_expression positions. If none is found, the default value is returned(defaults to [line: 1, column: 1]).

get_line(arg, default \\ 1)

Returns the line of a node. If none is found, the default value is returned(defaults to 1).

get_meta(arg)

Returns the metadata of the given node.

get_range(quoted, opts \\ [])

Gets the range used by the given quoted expression in the source code.

get_start_position(quoted, default \\ [line: 1, column: 1])

Returns the start position of a node.

parse_expression(string, opts \\ [])

Parses a single expression from the given string. It tries to parse on a per-line basis.

parse_string(source)

Parses the source code into an extended AST suitable for source manipulation as described in Code.quoted_to_algebra/2.

parse_string!(source)

Same as parse_string/1 but raises on error.

patch_string(string, patches)

Applies one or more patches to the given string.

postwalk(quoted, fun)

Performs a depth-first post-order traversal of a quoted expression.

postwalk(quoted, acc, fun)

Performs a depth-first post-order traversal of a quoted expression with an accumulator.

prepend_comments(quoted, comments, position \\ :leading)

Prepends comments to the leading or trailing comments of a node.

prewalk(quoted, fun)

Performs a depth-first pre-order traversal of a quoted expression.

prewalk(quoted, acc, fun)

Performs a depth-first pre-order traversal of a quoted expression with an accumulator.

quoted_to_algebra(quoted, opts)

A wrapper around Code.quoted_to_algebra/2 for compatibility with pre 1.13 Elixir versions.

string_to_quoted(string, opts)

A wrapper around Code.string_to_quoted_with_comments/2 for compatibility with pre 1.13 Elixir versions.

string_to_quoted!(string, opts)

A wrapper around Code.string_to_quoted_with_comments!/2 for compatibility with pre 1.13 Elixir versions.

to_string(quoted, opts \\ [])

Converts a quoted expression to a string.

update_args(arg, fun)

Updates the arguments for the given node.

Link to this section Types

comment()

Specs

comment() :: %{
  line: integer(),
  previous_eol_count: integer(),
  next_eol_count: integer(),
  text: String.t()
}

patch()

Specs

patch() :: %{
  optional(:preserve_indentation) => boolean(),
  :range => range(),
  :change => String.t() | (String.t() -> String.t())
}

position()

Specs

position() :: keyword()

range()

Specs

range() :: %{start: position(), end: position()}

traversal_function()

Specs

traversal_function() ::
  (Macro.t(), Sourceror.TraversalState.t() ->
     {Macro.t(), Sourceror.TraversalState.t()})

Link to this section Functions

append_comments(quoted, comments, position \\ :leading)

Specs

append_comments(
  quoted :: Macro.t(),
  comments :: [comment()],
  position :: :leading | :trailing
) :: Macro.t()

Appends comments to the leading or trailing comments of a node.

compare_positions(left, right)

Specs

compare_positions(position(), position()) :: :gt | :eq | :lt

Compares two positions.

Returns :gt if the first position comes after the second one, and :lt for vice versa. If the two positions are equal, :eq is returned.

nil values for lines or columns are coalesced to 0 for integer comparisons.

correct_lines(meta, line_correction, opts \\ [])

Specs

correct_lines(Macro.t() | Macro.metadata(), integer(), Macro.metadata()) ::
  Macro.t() | Macro.metadata()

Shifts the line numbers of the node or metadata by the given line_correction.

This function will update the :line, :closing, :do, :end and :end_of_expression line numbers of the node metadata if such fields are present.

get_args(arg)

Specs

get_args(Macro.t()) :: [Macro.t()]

Returns the arguments of the node.

iex> Sourceror.get_args({:foo, [], [{:__block__, [], [:ok]}]})
[{:__block__, [], [:ok]}]

get_column(arg, default \\ 1)

Specs

get_column(Macro.t(), default :: integer() | nil) :: integer() | nil

Returns the column of a node. If none is found, the default value is returned(defaults to 1).

A default of nil may also be provided if the column number is meant to be coalesced with a value that is not known upfront.

iex> Sourceror.get_column({:foo, [column: 5], []})
5

iex> Sourceror.get_column({:foo, [], []}, 3)
3

get_end_line(quoted, default \\ 1)

Specs

get_end_line(Macro.t(), integer()) :: integer()

Returns the line where the given node ends. It recursively checks for end, closing and end_of_expression line numbers. If none is found, the default value is returned(defaults to 1).

iex> Sourceror.get_end_line({:foo, [end: [line: 4]], []})
4

iex> Sourceror.get_end_line({:foo, [closing: [line: 2]], []})
2

iex> Sourceror.get_end_line({:foo, [end_of_expression: [line: 5]], []})
5

iex> Sourceror.get_end_line({:foo, [closing: [line: 2], end: [line: 4]], []})
4

iex> """
...> alias Foo.{
...>   Bar
...> }
...> """ |> Sourceror.parse_string!() |> Sourceror.get_end_line()
3

get_end_position(quoted, default \\ [line: 1, column: 1])

Specs

get_end_position(Macro.t(), position()) :: position()

iex> quoted = ~S"""
...> A.{
...>   B
...> }
...> """ |>  Sourceror.parse_string!()
iex> Sourceror.get_end_position(quoted)
[line: 3, column: 1]

iex> quoted = ~S"""
...> foo do
...>   :ok
...> end
...> """ |>  Sourceror.parse_string!()
iex> Sourceror.get_end_position(quoted)
[line: 3, column: 1]

iex> quoted = ~S"""
...> foo(
...>   :a,
...>   :b
...>    )
...> """ |>  Sourceror.parse_string!()
iex> Sourceror.get_end_position(quoted)
[line: 4, column: 4]

get_line(arg, default \\ 1)

Specs

get_line(Macro.t(), default :: integer() | nil) :: integer() | nil

Returns the line of a node. If none is found, the default value is returned(defaults to 1).

A default of nil may also be provided if the line number is meant to be coalesced with a value that is not known upfront.

iex> Sourceror.get_line({:foo, [line: 5], []})
5

iex> Sourceror.get_line({:foo, [], []}, 3)
3

get_meta(arg)

Specs

get_meta(Macro.t()) :: Macro.metadata()

Returns the metadata of the given node.

iex> Sourceror.get_meta({:foo, [line: 5], []})
[line: 5]

get_range(quoted, opts \\ [])

Gets the range used by the given quoted expression in the source code.

The quoted expression must have at least line and column metadata, otherwise it is not possible to calculate an accurate range, or to calculate it at all. This function is most useful when used after Sourceror.parse_string/1, before any kind of modification to the AST.

The range is a map with :start and :end positions.

iex> quoted = ~S"""
...> def foo do
...>   :ok
...> end
...> """ |> Sourceror.parse_string!()
iex> Sourceror.get_range(quoted)
%{start: [line: 1, column: 1], end: [line: 3, column: 4]}

iex> quoted = ~S"""
...> Foo.{
...>   Bar
...> }
...> """ |> Sourceror.parse_string!()
iex> Sourceror.get_range(quoted)
%{start: [line: 1, column: 1], end: [line: 3, column: 2]}

Options

  - `:include_comments` - When `true`, it includes the comments into the range. Defaults to `false`.

  iex> ~S"""
  ...> # Foo
  ...> :baz # Bar
  ...> """
  ...> |> Sourceror.parse_string!()
  ...> |> Sourceror.get_range(include_comments: true)
  %{start: [line: 1, column: 1], end: [line: 2, column: 11]}

get_start_position(quoted, default \\ [line: 1, column: 1])

Specs

get_start_position(Macro.t(), position()) :: position()

Returns the start position of a node.

iex> quoted = Sourceror.parse_string!(" :foo")
iex> Sourceror.get_start_position(quoted)
[line: 1, column: 2]

iex> quoted = Sourceror.parse_string!("\n\nfoo()")
iex> Sourceror.get_start_position(quoted)
[line: 3, column: 1]

iex> quoted = Sourceror.parse_string!("Foo.{Bar}")
iex> Sourceror.get_start_position(quoted)
[line: 1, column: 1]

iex> quoted = Sourceror.parse_string!("foo[:bar]")
iex> Sourceror.get_start_position(quoted)
[line: 1, column: 1]

iex> quoted = Sourceror.parse_string!("foo(:bar)")
iex> Sourceror.get_start_position(quoted)
[line: 1, column: 1]

parse_expression(string, opts \\ [])

Specs

parse_expression(String.t(), keyword()) ::
  {:ok, Macro.t(), String.t()} | {:error, String.t()}

Parses a single expression from the given string. It tries to parse on a per-line basis.

Returns {:ok, quoted, rest} on success or {:error, source} on error.

Examples

iex> ~S"""
...> 42
...>
...> :ok
...> """ |> Sourceror.parse_expression()
{:ok, {:__block__, [trailing_comments: [], leading_comments: [],
                    token: "42", line: 2, column: 1], [42]}, "\n:ok"}

Options

:from_line - The line at where the parsing should start. Defaults to 1.

parse_string(source)

Specs

parse_string(String.t()) :: {:ok, Macro.t()} | {:error, term()}

Parses the source code into an extended AST suitable for source manipulation as described in Code.quoted_to_algebra/2.

Two additional fields are added to nodes metadata:

:leading_comments - a list holding the comments found before the node.
:trailing_comments - a list holding the comments found before the end of the node. For example, comments right before the end keyword.

Comments are the same maps returned by Code.string_to_quoted_with_comments/2.

parse_string!(source)

Specs

parse_string!(String.t()) :: Macro.t()

Same as parse_string/1 but raises on error.

patch_string(string, patches)

Specs

patch_string(String.t(), [patch()]) :: String.t()

Applies one or more patches to the given string.

This functions limits itself to apply the patches in order, but it does not check for overlapping ranges, so make sure to pass non-overlapping patches.

A patch is a map containing at least the range that it should patch, and the change to be applied in the range, for example:

iex> original = ~S"""
...> if not allowed? do
...>   raise "Not allowed!"
...> end
...> """
iex> patch = %{
...>   change: "unless allowed? do\n  raise \"Not allowed!\"\nend",
...>   range: %{start: [line: 1, column: 1], end: [line: 3, column: 4]}
...> }
iex> Sourceror.patch_string(original, [patch])
~S"""
unless allowed? do
  raise "Not allowed!"
end
"""

A range can also be a function, in which case the original text in the patch range will be given as an argument:

iex> original = ~S"""
...> hello :world
...> """
iex> patch = %{
...>   change: &String.upcase/1,
...>   range: %{start: [line: 1, column: 7], end: [line: 1, column: 13]}
...> }
iex> Sourceror.patch_string(original, [patch])
~S"""
hello :WORLD
"""

By default, the patch will be automatically indented to match the indentation of the range it wants to replace if the change is a text string:

iex> original = ~S"""
...> foo do bar do
...>   :ok
...>   end end
...> """
iex> patch = %{
...>   change: "baz do\n  :not_ok\nend",
...>   range: %{start: [line: 1, column: 8], end: [line: 3, column: 6]}
...> }
iex> Sourceror.patch_string(original, [patch])
~S"""
foo do baz do
    :not_ok
  end end
"""

If you don't want this behavior, you can add :preserve_indentation: false to your patch:

iex> original = ~S"""
...> foo do bar do
...>   :ok
...>   end end
...> """
iex> patch = %{
...>   change: "baz do\n  :not_ok\nend",
...>   range: %{start: [line: 1, column: 8], end: [line: 3, column: 6]},
...>   preserve_indentation: false
...> }
iex> Sourceror.patch_string(original, [patch])
~S"""
foo do baz do
  :not_ok
end end
"""

postwalk(quoted, fun)

Specs

postwalk(Macro.t(), traversal_function()) :: Macro.t()

Performs a depth-first post-order traversal of a quoted expression.

See postwalk/3 for more information.

postwalk(quoted, acc, fun)

Specs

postwalk(Macro.t(), term(), traversal_function()) :: {Macro.t(), term()}

Performs a depth-first post-order traversal of a quoted expression with an accumulator.

fun is a function that will receive the current node as a first argument and the traversal state as the second one. It must return a {quoted, state}, in the same way it would return {quoted, acc} when using Macro.postwalk/3.

The state is a map with the following keys:

:acc - The accumulator. Defaults to nil if none is given.

prepend_comments(quoted, comments, position \\ :leading)

Specs

prepend_comments(
  quoted :: Macro.t(),
  comments :: [comment()],
  position :: :leading | :trailing
) :: Macro.t()

Prepends comments to the leading or trailing comments of a node.

prewalk(quoted, fun)

Specs

prewalk(Macro.t(), traversal_function()) :: Macro.t()

Performs a depth-first pre-order traversal of a quoted expression.

See prewalk/3 for more information.

prewalk(quoted, acc, fun)

Specs

prewalk(Macro.t(), term(), traversal_function()) :: {Macro.t(), term()}

Performs a depth-first pre-order traversal of a quoted expression with an accumulator.

The state is a map with the following keys:

:acc - The accumulator. Defaults to nil if none is given.

quoted_to_algebra(quoted, opts)

(macro)

A wrapper around Code.quoted_to_algebra/2 for compatibility with pre 1.13 Elixir versions.

string_to_quoted(string, opts)

(macro)

A wrapper around Code.string_to_quoted_with_comments/2 for compatibility with pre 1.13 Elixir versions.

string_to_quoted!(string, opts)

(macro)

A wrapper around Code.string_to_quoted_with_comments!/2 for compatibility with pre 1.13 Elixir versions.

to_string(quoted, opts \\ [])

Specs

to_string(Macro.t(), keyword()) :: String.t()

Converts a quoted expression to a string.

The comments line number will be ignored and the line number of the associated node will be used when formatting the code.

Options

:indent - how many indentations to insert at the start of each line. Note that this only prepends the indents without checking the indentation of nested blocks. Defaults to 0.
:indent_type - the type of indentation to use. It can be one of :spaces, :single_space or :tabs. Defaults to :spaces.
:format - if set to :splicing, if the quoted expression is a list, it will strip the square brackets. This is useful to print a single element of a keyword list.

For more options see Code.format_string!/1 and Code.quoted_to_algebra/2.

update_args(arg, fun)

Specs

update_args(Macro.t(), ([Macro.t()] -> [Macro.t()])) :: Macro.t()

Updates the arguments for the given node.

iex> node = {:foo, [line: 1], [{:__block__, [line: 1], [2]}]}
iex> updater = fn args -> Enum.map(args, &Sourceror.correct_lines(&1, 2)) end
iex> Sourceror.update_args(node, updater)
{:foo, [line: 1], [{:__block__, [line: 3], [2]}]}