When writing articles, blog posts, technical papers, README's etc, it's common to reference other web pages. Your content may not move, but there is no guarantee other peoples stuff won't - sites go down, links change and certificates expires. Nothing can be as frustrating as finding that 6 year old blog post with the answers to your problems, but then find it full of dead links. Thankfully you can set a good example and check your own writing for dead links with a little Ruby-tool called awesome_bot.
awesome_bot name comes from the "awesome pages" that are rather popular on GitHub. If you don't know about Awesome, it's usually collections of links with "awesome" tools, resources, libraries etc. for a specific topic or ecosystem. Examples are awesome-dotnet, awesome-react, awesome-ai-awesomeness or awesome-ci - the list is endless. Needless to say, these pages contain lots and lots of links, hence the need for validating the aliveness of these links.
If you are into Ruby and gems,
awesome_bot can be installed with
gem install awesome_bot
I prefer the container approach, and dkhamsing/awesome_bot image can be used.
An example execution of
awesome_bot might look like this:
$ docker run --rm -v $(pwd):/mnt -t andmos/awesome-bot -f **/*.csv --allow-redirect --allow 429 > Checking links in misc/Roasteries.csv > Will allow errors: 429 > Will allow redirects Links to check: 14 01. https://www.kaffebrenneriet.no/ 02. https://www.timwendelboe.no/ 03. https://sh.no/ 04. https://www.kaffa.no/ 05. https://www.srw.no/ 06. https://www.fjellbrent.no/ 07. https://jacobsensvart.no/ 08. https://www.facebook.com/stormkaffe 09. https://www.pala.no/ 10. https://www.langorakaffe.no/ 11. https://inderoy.coffee/ 12. https://bonneribyen.no/ 13. https://www.facebook.com/brentkaffe/ 14. https://senjaroasters.com/ Checking URLs: ✓✓✓✓✓✓✓→✓✓✓✓✓✓ No issues :-)
In this example we check links from CSV files and use the flags
--allow-redirect to, well, allow redirects (which throws errors if not given) and
--allow 429 to whitelist the "Too many requests" status code.
If something is off with a link, like a 404,
awesome_bot will throw an exit-code and show issues in the report:
$ docker run --rm -v $(pwd):/mnt andmos/awesome-bot -f *.md > Checking links in README.md Links to check: 1 1. https://www.an.no/some/dead/link Checking URLs: x Issues :-( > Links 1. [L1] 404 https://www.an.no/some/dead/link > Dupes None ✓ Wrote results to ab-results-README.md.json Wrote filtered results to ab-results-README.md-filtered.json Wrote markdown table results to ab-results-README.md-markdown-table.json
awesome_bot can be automated to run scheduled with you favorite CI system - here is a GitHub Actions example:
name: Verify Links on: pull_request: workflow_dispatch: schedule: - cron: '0 13 * * 1' jobs: Awesome-bot: name: Run Awesome-bot runs-on: ubuntu-latest steps: - name: Checkout uses: actions/checkout@v3 - name: Verify Links run: | docker run --rm -v $(pwd):/mnt andmos/awesome-bot -f *.md --allow-redirect --allow 429 --allow-ssl --white-list "nasdaq.com,researchgate.net"