Domain Authority 50 for your website - Guaranteed Service
cent
for only 150 usd, you`ll have DA50 for your website, guaranteed
Order it today:
http://www.str8-creative.co/product/moz-da-seo-plan/
thanks
Alex Peters
There are many days when I don't feel like working on my project. I use this feeling to "productively procrastinate" on things that I've been wanting to do but haven't done yet. Earlier this week I decided to tackle two related problems:
To implement this, I parsed each page and found the links using a regular expression pattern and some quick-and-dirty code:
RE_anchor = re.compile(r'<a[^>]* href="([^"#]+)[^"]*"') def url_links_to(relativeurl, html_contents): "Return the set of relative links from this page" site = "https://www.redblobgames.com" urls = [] for url in RE_anchor.findall(html_contents): if url.find("mailto:") == 0: continue url = urllib.parse.urljoin(site + relativeurl, url) if not url.startswith(site): continue url = url.replace(site, "") if not (url.endswith(".html") or url.endswith("/")): continue if url.endswith("/index.html"): url = url.replace("/index.html", "/") if url == "/": url = "/index.html" if url in urls: continue urls.append(url) return urls
I then used depth first search to find all the pages reachable from the home page:
# link_map[url] = urls_links_to(url, contents of page) def all_reachable_pages(link_map): "Return a list of all pages reachable from the home page" frontier = ["/index.html"] reached = set(frontier) while frontier: url = frontier.pop() if url not in link_map: print("WARNING: possible 404", url) continue for child in link_map[url]: if child not in reached: frontier.append(child) reached.add(child) return reached
For part 1, I made a list of the reachable pages and I plan to review it periodically.
For part 2, I want help readers who encounter a 404 on my site. I looked through the 404 server logs to see what I might be able to help with. I found lots of bogus requests such as wpAdmin and other admin URLs (people trying to break into my server), and also lots of what seemed to be buggy crawlers. But I also found many URLs that seem to come from real humans. These seem to be either from copy/paste or forums automatically linkifying URLs:
The last one looks like a Markdown typo. There are also some that look like escaping/quoting errors:
All of these seem to have an unwanted suffix. I decided to implement a suggestion on the 404 page. I looked for a prefix of the non-matching URL that matched a valid URL. I picked the longest match:
const request = window.location.pathname; let bestUrl = ""; for (let url of urlsReachableFromHomePage) { if (url.length > bestUrl.length && request.slice(0, url.length) == url) { bestUrl = url; } }
You can try it out by clicking on the broken links above.
This was a relatively low priority project but so satisfying.
![]() |
Heavy French and Italian cavalry |
![]() |
Steel Fist again, remarkable figures- side-show Bob leading the charge. |
![]() |
Earlier Italian Knights from Steel Fist |
![]() |
Eureka again, very varied and clever designs |
![]() |
Foundry Gendarmes |
![]() |
Another view of Wargames Foundry |
![]() |
Foundry head on |
![]() |
All banners from Pete's Flags. |