Working out website coverage bugs
I want more people to come to the website.
Currently - most (144 out of 179) of the pages on this website aren’t indexed. That means that google knows about them but doesn’t consider them important enough to add to the search database.
Broadly, nobody knows for sure how to solve this problem. The most common suggested solutions are sitemap related, but mine has been double checked.
My page experience is fine, and all other indicators are good:
To find out more I exported the lists of pages that are index and pages that aren’t to a google doc
After I looked at both lists, several things jumped out:
- Posting a link to twitter didn’t predict if google was going to index it.
- All links that I posted to Facebook did appear in the index.
- All pages that were linked to from one of my popular pages appeared in the index.
The last two didn’t cover all the pages.
So, there are two obvious next moves.
- I’ve added eight unindexed links in total to my three most popular pages.
- I’ve posted two different unindexed links to facebook.
- https://joereddington.github.io/2021/01/28/Finder-Zero-You-use-far-fewer-files-than-you-think,-so-stop-hoarding-them.html
- https://joereddington.github.io/2016/06/08/the-dvoark-keyboard-layout-and-what-i-learned/html
I’ll check back in a week and see if anything changed.
At the same time, when poking around the Google Search Console. I found a button that said “Do you want to validate the missing 140 links” and I was like… “Um, okay”, so I’ve pressed that button and we’ll see if it makes any difference.
EDIT: I checked in on the 8th September: the coverage was unchanged.
UPDATE TWO YEARS LATER
For some reason I happened to have a look at the Google Search Console again this week.
Suddenly, many more pages are index. 177 in fact.
(Noticeably, from my last experiment, the Finder Zero Post isn’t indexed, but the Dvoark keyboard one is)
I’ve also got a lot more information about why things might have been included or NOT.
The sitemap page now looks like this:
After a lot of looking I realised that the urls in the sitemap were in the form “https://joereddington.github.io/723/2011/01/03/emma/” rather than “https://joereddington.com/723/2011/01/03/emma/”, which I feel quite silly about.
I’ve updated the jekyll _config file and will come back in a couple of weeks to see if it made a different. My working assumption is that ‘the sitemap has always been broken in this way but that Google didn’t give error messages like this at the time, also, it’s quietly indexed a bunch more pages in the background from various links’.