Path to philosophy from anywhere
Alexa, how do i scrape wikipedia?
I first read about Path to Philosophy on a subreddit called r/dailyprogrammer, if i remember correctly it was a posted as a medium difficult task. The rules are quite simple, start at any wikipedia page, follow the first link in the first paragraph, repeat until either Philosophy is reached or a loop is found.
First link meaning: the first link that isn’t enclosed in bracers from the top of the main bread text. The reason for not including links within bracers is because they are usually related to language pronounciation, which isn’t revelant to the content of the page. I’ve written a simple script that prints each page title until Philosophy or a loop is found. Loop here means when a page links to a previously visted page, this becomes a loop since the link is always the same of each page(unless someone edits the page during script run-time, pretty unlikely).
Anyways, here is the link to the script. There are probably cases where it breaks, also it could be faster. I might optimize it for another idea I have where you select a source and destination wiki-page, and then create a graph for all links and related links. With this graph perform Djikstra’s shortest path algorithm.