AST Grep
Sometimes you need to search/replace but on code, so let’s use ASTs for that. The tool is called ast-grep (which you invoke with sg).
Pet Project: Extracting all documentation from hyper-div
See the finished script here
I want to create LLM consumable documentation for hyper-div. Normally, I’d use a web-crawler to do the extraction, but I can’t because the pages are all generated by JavaScript and I’m too lazy to set up a crawler that’s web page driven.
Now, it turns out hyper-div documentation is generated from Python files, and the majority of relevant docs can be extracted via the output tags, e.g., hd.markdown, hd.heading, p.title.
@router.route("/guide/loops")
def loops():
with page() as p:
p.title("# Rendering in Loops")
docs_markdown(
"""
When rendering components in loops, we have to take a bit
of extra precaution. For example, if we want to render 5
sliders in a loop, this code will not work:
""")
Now we want to extract all the documentation from the Python files. We can use ast-grep to do this.
sg --pattern 'docs_markdown($PARAM)' hyperdiv_docs/pages/**/*.py
We can do something more complex with rules:
# YAML Rule is more powerful!
# https://ast-grep.github.io/guide/rule-config.html#rule
id: main
language: python
rule:
any:
- pattern: hd.markdown($A)
- pattern: p.title($T)
- pattern: docs_markdown($DM)
- pattern: code_example($CE,$PARAM2) # code_example has 2 params
This has human-readable output, which we can make computer-readable with
❮ sg scan --rule doc_puller.yaml --json | jq -r '.[] | {file: .file, lines: .lines} | "\(.file)\n\(.lines)"'
Notes
- When you’re playing with this, you’ll certainly want to use the playground