AST Grep

Sometimes you need to search/replace but on code, so let’s use ASTs for that. The tool is called ast-grep (which you invoke with sg).

Pet Project: Extracting all documentation from hyper-div

See the finished script here

I want to create LLM consumable documentation for hyper-div. Normally, I’d use a web-crawler to do the extraction, but I can’t because the pages are all generated by JavaScript and I’m too lazy to set up a crawler that’s web page driven.

Now, it turns out hyper-div documentation is generated from Python files, and the majority of relevant docs can be extracted via the output tags, e.g., hd.markdown, hd.heading, p.title.

@router.route("/guide/loops")
def loops():
    with page() as p:
        p.title("# Rendering in Loops")

        docs_markdown(
            """

            When rendering components in loops, we have to take a bit
            of extra precaution. For example, if we want to render 5
            sliders in a loop, this code will not work:
            """)

Now we want to extract all the documentation from the Python files. We can use ast-grep to do this.

sg --pattern 'docs_markdown($PARAM)' hyperdiv_docs/pages/**/*.py

We can do something more complex with rules:

# YAML Rule is more powerful!
# https://ast-grep.github.io/guide/rule-config.html#rule
id: main
language: python
rule:
  any:
    - pattern: hd.markdown($A)
    - pattern: p.title($T)
    - pattern: docs_markdown($DM)
    - pattern: code_example($CE,$PARAM2) # code_example has 2 params

This has human-readable output, which we can make computer-readable with

 sg scan --rule doc_puller.yaml --json  |  jq -r '.[] | {file: .file, lines: .lines} | "\(.file)\n\(.lines)"'

Notes

  • When you’re playing with this, you’ll certainly want to use the playground