Introducing SEOmoz’s Updated Page Authority and Domain Authority

Here at Moz, we take metrics and analytics seriously and work hard to ensure that our metrics are first rate. Among our most important link metrics are Page Authority and Domain Authority. Accordingly, we have been working to improve these so that they more accurately reflect a given page or domain’s ability to rank in search results. This blog entry provides an overview of these metrics and introduces our new Authority models with a deep technical dive.

What are Page and Domain Authority?

Page and Domain Authority are machine learning ranking models that predict the likelihood of a single page or domain to rank in search results, regardless of page content. Their input is the 41 link metrics available in our Linkscape URL Metrics API call and their output is a score on a scale from 1 to 100. They are keyword agnostic because they do not use any information about the page content.

Why are Page and Domain Authority being updated?

Since these models predict search engine position, it is important to update them periodically to capture changes in the search engines’ ranking algorithms. In addition, this update includes some changes to the underlying models resulting in increased accuracy. Our favorite measure of accuracy is the mean Spearman Correlation over a collection of SERPs. The next chart compares the correlations on several previous indices and the next index release (Index 47).

The new model out performs the old model on the same data using the top 30 search results, and performs better if more results are used (top 50). Note that these are out of sample predictions.

When will the models change? Will this affect my scores?

The models will be updated when we roll out the next Linkscape index update, sometime during the week of November 28. Your scores will likely change a little, and may potentially change by as many as 20 points or more. I’ll present some data later in this post that shows most PRO and Free Trial members with campaigns will see a slight increase in their Page Authority.

What does this mean if I use Page Authority and Domain Authority data?

First, the metrics will be better at predicting search position, and Page Authority will remain the single highest correlated metric with search position that we have seen (including mozRank and the other 100+ metrics we examined in our Search Engine Ranking Factors study). However, since we don’t yet have a good web spam scoring system, sites that manipulate search engines will slip by us (and look like an outlier), so a human review is still wise.

Before presenting some details of the models, I’d like to illustrate what we mean by a “machine learning ranking model.” The table below shows the top 26 results for the keyword “pumpkin recipes” with a few of our Linkscape metrics (Google-US search engine; this is from an older data set and older index, but serves as a good illustration).

Pumpkin Recipes SERP result

As you can see, there is quite a spread among the different metrics illustrated, with some of the pages having a few links and others 1,000+ links. The Linking Root Domains are also spread from only 46 Linking Root Domains to 200,000+. The Page Authority model takes these link metrics as input (plus 36 other link metrics not shown) and predicts the SERP ordering. Since it only takes into account link metrics (and explicitly ignores any page or keyword content), but search engines take many ranking factors into consideration, the model cannot be 100% accurate. Indeed, in this SERP, the top result benefits from an exact domain match to the keyword and helps explain its #1 position despite its relatively low link metrics. However, since Page Authority only takes link metrics as input, it is a single aggregate score that explains how likely a page is to rank in search based only on links. Domain Authority is similar for domain wide ranking. The models are trained on a large collection of Google-US SERP results.

Despite restricting to only link metrics, the new Page and Domain Authority models do a good job of predicting SERP ordering and improve substantially over the existing models. This increased accuracy is due in part to the new model’s ability to better separate pages with moderate Page Authority values into higher and lower scores.

This chart shows the distribution of the Page Authority values for the new and old models over a data set generated from 10,000+ SERPs that includes 200,000+ unique pages (similar to the one used in our Search Engine Ranking Factors). As you can see, the new model has “fatter tails” and moves some of the pages with moderate scores to higher and lower values resulting in better discriminating power. The average Page Authority for both sets is about the same, but the new model has a higher standard deviation, consistent with a larger spread. In addition to the smaller SERP data set, this larger spread is also present in our entire 40+ billion page index (plotted with the logarithm of page/domain count to see the details in the tails):

One interesting comparison is the change in Page Authority for the domains, subdomains and sub-folders PRO and Free Trial members are tracking in our campaign based tools.

The top left panel in the chart shows that the new model shifts the distribution of Page Authority for the active domains, subdomains and sub-folders to the right. The distribution of the change in Page Authority is included in the top right panel, and shows that most of the campaigns have a small increase in their scores (average increase is 3.7), with some sites increasing by 20 points or more. A scatter plot of the individual campaign changes is illustrated in the bottom panel, and shows that 82% of the active domains, subdomains and sub-folders will see an increase in their Page Authority (these are the dots above the gray line). It should be noted that these comparisons are based solely on changes in the model, and any additional links that these campaigns have acquired since the last index update will act to increase the scores (and conversely, any links that have been dropped will act to decrease scores).

The remainder of this post provides more detail about these metrics. To sum up this first part, the models underlying the Page and Domain Authority metrics will be updated with the next Linkscape index update. This will improve their ability to predict search position, due in part to the new model’s better ability to separate pages based on their link profiles. Page Authority will remain the single highest correlated metric with search position that we have seen.

 


The rest of the post provides a deeper look at these models, and a lot of what follows is quite technical. Fortunately, none of this information is needed to actually use these Authority scores (just as understanding the details of Google’s search algorithm is not necessary to use it). However, if you are curious about some of the details then read on.

The previous discussion has centered around distributions of Page Authority across a set of pages. To gain a better understanding of the models’ characteristics, we need to explore its behavior on the inputs. However, the inputs are a 41 dimensional space and it’s impossible (for me at least!) to visualize anything in 41 dimensions. As an alternative, we can attempt to reduce the dimensionality to something more manageable. The intuition here is that pages that have a lot of links probably have a lot of external links, followed links, a high mozRank, etc. Domains that have a lot of linking root domains probably have a lot of linking IPs, linking subdomains, a high domain mozRank, etc. One approach we could take is simply to select a subset of metrics (like the table in the “pumpkin recipes” SERP above) and examine those. However, this throws away the information from the other metrics and will inherently be more noisy then something that uses all of them. Principal Component Analysis (PCA) is an alternate approach that uses all of the data. Before diving into the PCA decomposition of the data, I’ll take a step back and explain what PCA is with an example.

Principal Component Analysis is a technique that reduces dimensionality by projecting the data onto Principal Components (PC) that explain most of the variability in the original data.  This figure illustrates PCA on a small two dimensional data set:

This sample data looks roughly like an ellipse. PCA computes two principal components illustrated by the red lines and labeled in the graph that roughly align with the axes of the ellipse.& One representation of the data is the familiar (x, y) coordinates. A second, equivalent representation is the projection of this data onto the principal components illustrated by the labeled points. Take the upper point (7.5, 6). Given these two values, it’s hard to determine where it is in the ellipse. However, if we project it onto the PCs we get (4.5, 1.2)which tells us that it is far to the right of the center along the main axis (the 4.5 value) and a little up along the second axis (the 1.2 value).

We can do the same thing with the link metrics, only instead of using two inputs we use all 41 inputs. After doing so, something remarkable happens:

Two principal components naturally emerge that collectively explain 88% of the covariance in the original data! Put another way, almost all of the data lies in some sort of strange ellipse in our 41 dimensional space. Moreover, these PCs have a very natural link to our intuition. The first PC, which I’ll call the Domain/Subdomain PC projects strongly onto the domain and subdomain related metrics (upper panel, blue and red lines), and has a very small projection onto the page metric (upper panel green lines). The second PC has the opposite property and projects strongly onto page related metrics with a small projection onto Domain/Subdomain metrics.

Don’t worry if you didn’t follow all of that technical mumbo jumbo in the last few paragraphs. Here’s the key point: instead of talking about number of links, followed external links to domains, linking root domains, etc. we can instead talk about just two things – an aggregate domain/subdomain link metric and an aggregate page link metric and recover most of the information in the original 41 metrics.

Armed with this new knowledge, we can revisit the 10K SERP data and analyze it in with these aggregate metrics.

This chart shows the joint distribution of the 10K SERP data projected onto these PCs, along with the marginal distribution of each on the top and right hand side. At the bottom left side of the chart are pages with low values for each PC signifying that the page doesn’t have many links and they are on domains without many links. There aren’t many of these in the SERP data since these are unlikely to rank in search results. In the upper right are heavily linked to pages on heavily linked to domains, the most popular pages on the internet. Again, there aren’t many of these pages in the SERP data because there aren’t many of them on the internet (e.g. twitter.com, google.com, etc.) Interestingly, most of the SERP data falls into one of two distinct clusters. By examining the follow figure we can identify these clusters:

This chart shows the average folder depth of each search result, where folder depth is defined as the number of slashes (/) after the home page (with 1 defined to be the home page). By comparing with the previous chart, we can identify the two distinct clusters as home pages and pages deep on heavily linked to domains.

To circle back to search position, we can plot the average search position:

We see a general trend toward higher search position as the aggregate page and domain metrics increase. This data set only collected the top 30 results for each keyword, so values of average search position greater than 16 are in the bottom half of our data set. Finally, we can visually confirm that our Page and Domain Authority models capture this behavior and gain further insight into the new vs old model differences:

This is a dense figure, but here are the most important pieces. First, Page Authority captures the overall behavior seen in the Average Search position plot, with higher scores for pages that rank higher and lower scores for pages that rank lower (top left). Second, comparing the old vs new models, we see that the new model predicts higher scores for the most heavily linked to pages and lower scores for the least heavily linked to pages, consistent with our previous observation that the new model does a better job discriminating among pages.

Source: http://www.seomoz.org/blog/introducing-seomoz-updated-page-authority-and-domain-authority

Advertisements

About SEO Updates
I'm an SEO :)

59 Responses to Introducing SEOmoz’s Updated Page Authority and Domain Authority

  1. Cloud Control review by Jason Fielder. I assure you this is a Forex course that you will not want to miss. Check out here the details http://tandaa.com/reviewbonus/cloud-control-trader-bonus-cloud-control-trader-review.html

  2. I really enjoy the blog article. Great.

  3. I run a similar set of websites

  4. I’ve added this blog to my Blogroll, it will definitely be very helpful for my Blog, keep up the good and informative article

  5. Any body knows about make market launch it?? Get Make Market Launch It Bonus Free Apple iPad 2 +++ at my site http://tandaa.com

  6. Rashad Mound says:

    Thank you Amy for your encouraging words has comment of my English,
    and for your wishes.

  7. STORMY KEAMS says:

    Can hardly wait to see you in April in Louisville,Ky.Counting the days.

  8. ron paul says:

    I’m also commenting to make you know of the fabulous discovery our child had going through the blog. She picked up several issues, which include what it’s like to possess a very effective giving style to make folks without problems have an understanding of specific tortuous subject areas. You truly did more than her expectations. Thank you for displaying those effective, healthy, edifying and in addition unique thoughts on that topic to Ethel.

  9. Now you’ve got your new web site and you’re eager to begin making some sales! But, how can you make sales if you should not have high volumes of holiday makers to your website?

  10. Pingback: Get Rid Of Pimples Overnight

  11. This is a great blog and i want to visit this every day of the week .

  12. Hi there :)What’s up? :p I’m happy that you are still around “the internet” 😛 How did the last Google update affect your site? It seems that your blog is one of the strongest in the current SERP’s! Keep that up and don’t forget to msg me when you catch some free time 🙂 Hope to see you soon!

  13. You made some clear points there. I did a search on the issue and found most persons will go along with with your site.

  14. Pingback: Getting Rid Of Blackheads

  15. Thanks for taking the time to discuss this, I really feel strongly about it and love learning more on this topic. If attainable, as you acquire expertise, would you thoughts updating your weblog with additional data? It is extremely useful for me.

  16. I love the commentary on this web site, it really gives it that community feel!

  17. click here says:

    I needed to thanks for this great read!! I definitely enjoying each little bit of it I have you bookmarked to check out new stuff you publish

  18. After study a handful of the web sites for your web site now, and i also genuinely such as your technique for blogging. I bookmarked it to my bookmark site list and you will be checking back soon. Pls look at my site as well and let me know if you agree.

  19. You seem to be very professional in the way you write.’*;,,

  20. I’d have to examine with you here. Which is not one thing I usually do! I take pleasure in reading a post that may make folks think. Additionally, thanks for permitting me to comment!

  21. Good blog! I really love how it is easy on my eyes and the data are well written. I am wondering how I could be notified when a new post has been made. I’ve subscribed to your RSS which must do the trick! Have a great day!

  22. I do not even know how I ended up here, but I thought this post was great. I don’t know who you are but certainly you’re going to a famous blogger if you are not already Cheers!

  23. Jose Shidel says:

    I am regularly proclaiming that its difficult to find quality help, but here is… george washington quarter

  24. Usually I do not read article on blogs, but I wish to say that this write-up very forced me to take a look at and do so! Your writing taste has been surprised me. Thank you, very great post.

  25. Italy Tour says:

    Some genuinely nice stuff on this internet site , I like it.

  26. Hey admin, incredibly informative blog post! Pleasee continue this awesome work..

  27. There are some attention-grabbing time limits on this write-up but I don’t know if I see all of them heart to heart. There may possibly be some validity but I will consider maintain viewpoint until I look into it additional. Great article , thanks and we want additional! Added to FeedBurner at the same time.

  28. The when I read a blog, Lets hope that this doesnt disappoint me just as much as that one. I am talking about, I know it was my option to read, but I actually thought youd have some thing intriguing to talk about. All I hear is really a couple of whining about something that you could fix should you werent too busy in search of attention.

  29. I tried to publish a comment earlier, however it has not shown up. I believe your spam filter might be broken?

  30. Pingback: video production rates

  31. hoodia patch says:

    Just like the old saying goes, within the pro’s head there are few options, however , for a person with the beginner’s brain, the world is open up.

  32. some sleeping bags are waterproof and weatherproof too, they are nice for camping outside the house*

  33. I’m happy I found this site! From time to time, students want to cognitive the keys of productive literary essays composing. Your first-class knowledge about this good post can become a proper basis for such people. cheers!

  34. Hi, have you ever before asked yourself to write about Nintendo or PSP?

  35. What i don’t realize is in reality how you are no longer actually a lot more neatly-preferred than you may be right now. You are so intelligent. You understand thus considerably with regards to this subject, produced me in my view believe it from so many various angles. Its like women and men aren’t fascinated except it’s one thing to accomplish with Girl gaga! Your individual stuffs great. Always maintain it up!

  36. I’d have to examine with you here. Which is not something I usually do! I get pleasure from reading a publish that can make individuals think. Additionally, thanks for allowing me to remark!

  37. Many thanks for this info I has been hunting all Google to be able to locate it!

  38. The second matter note might be you are able to SOLE check the particular condition with the duty reimburse on line by addressing all the INTERNAL REVENUE SERVICE web-site.

  39. After examine a number of of the blog posts in your web site now, and I truly like your method of blogging. I bookmarked it to my bookmark web site checklist and will probably be checking again soon. Pls take a look at my website as properly and let me know what you think.

  40. Thanks, I have just been looking for info approximately this topic for a long time and yours is the best I have came upon so far. But, what in regards to the bottom line? Are you certain concerning the supply?

  41. I am glad to be a visitant of this perfect blog, regards for this rare info!

  42. agio oros says:

    As I web site possessor I believe the content material here is rattling excellent , appreciate it for your efforts. You should keep it up forever! Best of luck.

  43. Whats up intelligent points.. now why didn’t i think of those? Off topic slightly, is this page pattern merely from an extraordinary set up or else do you use a customized template. I exploit a webpage i’m searching for to improve and effectively the visuals is probably going one of many key things to finish on my list.

  44. Hi there. I discovered your website by means of Google even as looking for a similar matter, your web site came up. It seems great. I have bookmarked it in my google bookmarks to come back later.

  45. promo code says:

    Perfectly indited content , thankyou for selective information .

  46. 茶葉禮盒 says:

    Very good ideas for how to improve my websites.Your posts are easy to follow!Keep up the good work!

  47. shokz guide says:

    Apple now has Rhapsody as an app, which is a great start, but it is currently hampered by the inability to store locally on your iPod, and has a dismal 64kbps bit rate. If this changes, then it will somewhat negate this advantage for the Zune, but the 10 songs per month will still be a big plus in Zune Pass’ favor.

  48. bonus code says:

    Wow! This could be one particular of the most beneficial blogs We have ever arrive across on this subject. Basically Magnificent. I am also an expert in this topic therefore I can understand your hard work.

  49. bonus code says:

    I always was concerned in this subject and still am, thanks for posting.

  50. Real good information can be found on site . “Often the test of courage is not to die but to live.” by Conte Vittorio Alfieri.

  51. Yay google is my king aided me to find this great website ! .

  52. backlinks says:

    Hi there just wanted to give you a brief heads up and let you know a few of the pictures aren’t loading correctly. I’m not sure why but I think its a linking issue. I’ve tried it in two different internet browsers and both show the same outcome.

  53. Latanya says:

    I have to say, while looking through hundreds of blogs daily, the theme of this blog is different (for all the proper reasons). If you do not mind me asking, what’s the name of this theme or would it be a especially designed affair? It’s significantly better compared to the themes I use for some of my blogs.

  54. Wow, amazing blog layout! How long have you been blogging for? you make blogging look easy. The overall look of your web site is fantastic, as well as the content!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: