MarketRank: The Anti-SEO Ranking Algorithm
Abstract
The websites at the top of Google are not the highest quality websites, but the ones that put the most effort into SEO [1] [2]. There is an entire industry whose purpose is to game their way to the top of search results, which inevitably leads to those results being debased over time.
We need a ranking algorithm that cannot be easily gamed. This paper describes MarketRank, a ranking algorithm that is immune to SEO and simple to compute. We discuss the advantages and limitations of MarketRank, and show how it compares to Google’s rankings.
1. Introduction and Motivation
Google’s search result quality has declined to the point where many have noticed [3][4][5].
High quality websites that don’t optimize for the right keywords and backlinks are left behind in favor of more SEO optimized websites. This has led to websites that many consider to be “spammy” or “low quality” being at the top of Google results.
A ranking algorithm can be judged by the results it produces when people try to game it. What we need is a ranking algorithm that can only be gamed by creating genuinely good content. If “gaming the system” and “creating high quality content” are the same thing, then we have created a good algorithm.
Quality is subjective, and many believe that Google’s results are still good enough. However, we believe that the kind of quality produced by MarketRank is distinctly different and useful.
2. The MarketRank Algorithm
2.1 The Market Analogy
PageRank [6] was built on the analogy of the research paper. A research paper with more citations is probably better.
MarketRank is built on the analogy of the market. An object with a higher market value is probably better.
In our case, the markets we refer to are online communities such as Reddit, Hacker News, and Twitter. Each of these communities has an upvote mechanism, which can be interpreted as an indication that the user believes the website is undervalued. They are willing to bid up the price by spending their upvote on it.
And so, the oversimplified description of MarketRank is “just add up all the upvotes”. We have some more work to do to make the values accurate, but this is the general idea.
2.2 Raw Market Value
The raw market value of any website is the main score that it received in an online community. Reddit and Hacker News have an obvious main score in the number of upvotes. On Twitter there are multiple metrics to choose from, so for now we’re going to choose likes as our main score.
Once we have the main scores for a website across all communities, we can turn these raw values into something that’s useful for comparison. We need to adust for inflation, and convert everything to one currency.
2.3 Inflation
The top upvoted post of 2015 on Hacker News has 2,228 upvotes, while the top upvoted post of 2017 has 4,107 upvotes. Over time most online communities grow steadily, but they all have a recency bias. This means that all the new users don’t end up voting on older content.
This leads to a fluctuation in the true value of an upvote. As a platform gains more users, votes become easier to accumulate, and less valuable.
We cannot merely take the raw score from a platform, otherwise we will suffer from a recency bias, and will not be able to tell what the best websites are.
To adjust for inflation, we compare the value of an equivalent basket of goods over time. In the case of our naive implementation here, our basket of goods is the top 50 most upvoted submissions of the year. We use the previous year as our baseline, since this year is incomplete, and convert everything to 2021 points.
For reference, here is a graph of inflation on Hacker News for blogs in the Blog Surf directory [7].
2.4 Currency Conversion
After adjusting for inflation, we have denominated everything in 2021 points per platform, but now we have another problem. The points are all for different platforms. How much Twitter likes is equal to a Hacker News upvote?
We must denominate everything in one currency to be able to make meaningful comparisons between them. It makes no difference what currency we choose, so in this case we will go with Reddit upvotes.
Since we’ve adjusted for inflation on every platform, we’re really converting everything to 2021 Reddit upvotes.
Our naive currency conversion will work exactly like our naive inflation adjustment. We will compare the cost of a similar basket of goods across the different platforms. In this case, our basket of goods will be similar to our inflation calculation, i.e the average of the top 50 highest ranking inflation adjusted websites on a platform.
Here is the difference in value of that basket of goods across platforms.
2.5 GDP
Finally we have a value we can use to compare scores across any platform and from any year. We can now calculate the “GDP” of a domain by adding the total value of all webpages produced by a domain.
We only count each webpage once, and if there are multiple values for a webpage, we choose the highest value the page has received. We do this because we believe it is extremely unlikely that there is a false positive in any moderated online community, though there may be a lot of false negatives.
By finding the sum of all value that this domain has produced, we can come up with a ranking for the domain.
Here we have a sample showing the top 10 domains in the Blog Surf directory ranked by GDP.
3. Google vs. MarketRank
This all sounds good in theory, but does it work in reality? Let’s find out by comparing MarketRank and Google.
Google doesn’t publish its site rankings, and any Google search will be a mix of relevance and quality. We have tried to approximate the quality ranking of a page by finding queries where all the results seem equally relevant. This means that the determining factor of the result order must have been Google’s quality ranking.
This is a noisy and non-ideal measurement, but will be good enough to tell us something about the performance of MarketRank.
3.1 Zero To One Book Review
We will try to find some book reviews of Peter Thiel’s Zero To One with the query “book review zero to one”.
Google and MarketRank both agree that the Atlantic post is the best, but after that we diverge significantly. The rest of Google’s first page of results have a MarketRank of 0, meaning they are either low quality, or unknown quality.
On page 2 we find Farnam Street and Slate Star Codex, both of which have a decent MarketRank.
Here is a table showing the MarketRank of each of Google’s results.
We do not think it is much of a leap to say that the Farnam Street and Slate Star Codex articles are higher quality than most of the articles on Google’s first page of results.
We are also going to take a wild guess that the Times of India focuses more on SEO than Slate Star Codex does.
3.2 Startup Ideas
Another way to try to figure out what pages Google thinks are high quality is with an exact match search. This is still noisy, because it may be more susceptible to keyword stuffing and other SEO hacks. But in general, we assume that it will look at all the pages that have the exact matching phrase, then rank them by quality.
Our exact matching phrase is “startup ideas”.
We are greeted first by an SEO optimized listicle from Nerdwallet, then another listicle, then a video on the YC site, then more listicles.
The discrepancy between this search result and the results produced by MarketRank are surprisingly high. Nothing on the first page of results has a MarketRank.
On page 4 of the results, we finally get to Paul Graham’s “How To Get Startup Ideas”.
No matter what Google’s search algorithm is doing to match the phrase “startup ideas”, we cannot make much sense of the fact that a page titled “How To Get Startup Ideas” on one of the most popular startup blogs, is considered by Google to be page 4 garbage.
MarketRank knows this page has been shared and upvoted a lot across various communities, so it must be important. It ends up having a MarketRank of 3,389 points.
Other notable pages that Google considers garbage but MarketRank considers good are "Startup Ideas" by Gwern (399 points), and "Developing new startup ideas" by Chris Dixon (447 points).
After looking into the results more, it seems that Google may be optimizing so hard for recency that it doesn’t care at all about these older articles. That may explain this strange result. Perhaps Paul Graham should change his title to “How To Get Startup Ideas in 2022”.
3.3 CSS Centering Guide
If there’s one thing that every web developer has searched, it is the infamous question of how to center something in CSS. In this case, we chose the specific query “css centering guide” because those words appeared in various titles, reducing the noise of relevance, and making the result order mostly Google’s idea of quality.
This query shows that MarketRank and Google don’t always have to disagree.
In this case, we agree on what pages are high quality, though disagree slightly on the exact ordering of them.
3.4 Spam Detection
This is a small example that demonstrates the anti-SEO properties of MarketRank. When searching for “biggest mistakes that kill startups” on Google, we would expect to be greeted by the Paul Graham essay with a pretty similar title “The 18 Mistakes That Kill Startups”.
Instead, we are greeted by an article from Business Insider India called “Biggest Mistakes that kill Startups”. They do have the word “biggest” in the title, so maybe that makes it more relevant.
The real kicker is that “Biggest Mistakes that kill Startups” is a cheap copy-paste of Paul Graham’s essay with ads all over it.
MarketRank would have given that website 0 points, whereas Google believes it is a legitimate website.
In general, we don’t expect that a spammy or low-quality website would get much points on any legitimate online community, and believe that MarketRank is naturally good at filtering out spam.
4. Limitations and Future Work
This is a very naive implementation of MarketRank, and there are many obvious ways to improve it. However, beyond improving the implementation, there are more fundamental issues with MarketRank which we will discuss in this section.
4.1 What an Upvote Really Means
We claimed that upvotes were a good measure of quality but this is not necessarily true. In reality, an upvote on any platform is a combination of quality and other things such as the popularity of the author, quality of the discussion produced by the article, and many more.
It is unclear what proportion of the value of an upvote is quality vs. everything else. Finding a better way to extract quality from upvotes will be an important problem to work on moving forward.
4.2 False Negatives and Low Coverage
One of the biggest issues with MarketRank is that many websites won’t have a MarketRank. A lot of the best websites will have one, but there will also be many good websites without a MarketRank.
The nature of these online communities leads to few false positives, but many false negatives. Many good webpages will not make it to the main feed or front page because of bad timing.
There’s only so much room on the front page, so no matter how much good stuff is submitted on a given day, only a few can get the opportunity to be fairly valued by the market.
Perhaps this should be fixed, but perhaps it is a feature and not a bug. It may be okay to optimize for a low false positive rate and high false negative rate. Most stuff on the internet is junk, and we may be okay with missing some good stuff as long as the stuff we have is definitely good.
5. A Blog Search Engine
In order to experiment with MarketRank, we developed a blog search engine called Blog Surf. Since blog posts are likely to be shared in online communities, we believe that MarketRank is particularly well-suited to the purpose of ranking blog posts.
There is an endless list of edge cases, improvements, and open questions for MarketRank. These questions cannot be solved in theory, only in practice. Blog Surf will serve as the experimentation ground for future iterations of MarketRank and other ranking algorithms we may come up with.
6. Conclusion
We have proposed a new method for ranking the quality of websites that is immune to SEO. We start off by counting upvotes from online communities, then make these numbers useful for comparison by adjusting for inflation and converting to one common currency.
We have shown that MarketRank is capable of producing a better quality ranking than Google for some queries.
This paper is published in hopes that you use it in your search engines and directories. There are many things left to be built when it comes to organizing information on the internet, and we hope that this simple algorithm will make it easier to have a baseline quality metric in such projects.
References
The top-ranking HTML editor on Google is an SEO scam (https://casparwre.de/blog/seo-scam/)
The mermaid is taking over Google search in Norway (https://alexskra.com/blog/the-mermaid-is-taking-over-google-search-in-norway/)
A New Google (https://dcgross.com/a-new-google)
Google is no longer producing high quality search results...(link)
Google [OC] (https://www.reddit.com/r/comics/comments/svgabe/google_oc/)
The PageRank Citation Ranking: Bringing Order to the Web (http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf)
Blog Surf (https://blogsurf.io)