INTERNAL LINK MAPPING

HOW TO CREATE A VISUAL LINK MAP

Website architecture can make or break your site.

A well thought out internal linking strategy can give a site a nice rankings boost, whereas if users can't find a page through clicking, Google is unlikely to rank that page very highly at all.

We can shape our website's internal architecture so that all pages are accessible within as few clicks as possible, creating a great user experience and saving crawl budget.

Simple content silo link map

In the simple content silo above, you can see that any page on the example site is reachable within three clicks from the homepage, and the website has 40 overall pages.

By adding just one more link from each page, so 4 in the first level instead of 3, then 4 in the next level, and so on, we'd have 85 pages available.

Now what if we had, say, 10 pages on each of the levels? We'll have 1,101 pages, all of which are navigable within 3 clicks from the homepage.

Pages that are hard to find by clicking are unlikely to rank as well as they could if they were easy to find.

This is one of the reasons that the homepage of a website will often rank for the most keywords and get the most traffic (this isn’t always the case, but the homepage will usually be in at least the top 5-10 pages in terms of traffic, even for very large sites).

In other words, If your site’s internal links are a disorganised mess, then it’s bad news for users trying to navigate your site.

What's more, it also forces Google to use more resources to crawl your page, and therefore you use more crawl budget.

This issue is further compounded when you consider the flow of link juice.

Just like how we build backlinks from other sites to help pages rank, we can help a page be more likely to rank by linking to it from other pages within the same website.

So it should be clear by now that what you don't want is a disorganised internal link structure.

But how can you tell if your website's architecture is no good? Simply going through the each page looking at which pages are linked to is one way, but it will take a long time.

Crawling the site and checking the Inlinks is another way, but again, it's a load of data to try to work through simply looking at the text.

So, today I'm going to show you how to create a visual map of your website's internal links that includes each page's backlink profile (Ahrefs URL rating for this example, but you can use other metrics if you prefer).

You can use this to quickly identify which pages can pass a lot of link juice, and which need some more internal links to show they're important.

Tools you’ll need:

  1. Ahrefs 
  2. Screaming Frog
  3. Gephi
  4. Any spreadsheet software

CREATE THE LINK MAP

1. Collect internal links with Screaming Frog

The very first thing we need to do is crawl the site and collect all the internal links.

Spider configuration

We don't need much data from the crawl, so set up Screaming frog as follows (Configuration > Spider):

Screaming Frog setup
Screaming frog setup 2

Link positions analysis

It can be interesting to ignore header, footer, menu, sidebar, etc. links in our map, so we can just see links in the page bodies, and Screaming Frog now has just such a functionality.

Go to Configuration > Custom > Link positions

The defaults are a good start but you may find it's not perfect. You'll need a licensed version of Screaming Frog to add custom link positions.

Learn how to set up custom link positions here: https://www.screamingfrog.co.uk/how-to-analyse-link-position/#configure

Screaming Frog Link Positions

Crawl the website and export the internal links

Now it's time to get Screaming Frog going on the website. This crawl should be quicker than a normal/ full crawl since we're not gathering much infor from each page.

Assuming our link positions are set up correctly, we can go ahead and export the links as soon as the crawl has finished.

Go to Bulk Export > Links > All Outlinks

Clean the data

Open up the exported .csv - depending on the size of the site you've analyzed, and the number of internal links, this can be a pretty big, resource-hungry, file.

Filter column A so that only hyperlinks remain.

Filter column C to be only your domain (internal links).

Check the status codes (column G). Make sure they're all 200 - if not, I'd recommend going back and fixing those first or you'll end up with an inaccurate map.

Filter Column I to only include Content links.

Rename Column C to Target.

Delete all columns apart from A and C.

Save/ Export.

2. Gather URL ratings

Now we want to get the URL rating for each of the pages on the site so when we're mapping our internal links, we can use this extra data to add even more power to the system.

Go to Ahrefs Site Explorer and enter your domain.

In the left toolbar, under Pages, click Best by Links. You'll see a page that looks like this:

Then choose Export, Full Export.

Ahrefs Export

3. Combine internal links with URL ratings

Now we're going to put all the URL ratings and internal link data into a single spreadsheet.

Open your new file and delete all columns except Page URL and URL ratings. Now copy those two columns into your Screaming Frog spreadsheet in columns C and D.

All Inlinks

Now:

  1. Insert one new column to the right of column A, and name it URL rating.
  2. Name the column "URL rating".
  3. Paste this exact formula into cell B2:
    =VLOOKUP(A:A,D:DD,2,false)
  4. Press enter.
  5. Double click the little blue square at the bottom right of B2 to copy the formula into the whole column.
  6. Now you have the URL rating of each source page, along with all the pages each one links to.
  7. Export as CSV.
Inlinks With URL Ratings

4. Visualizing your internal links

Now it's time to take all this data and start mapping our website architecture. There are different ways to do this to get slightly different visualizations, so it's worth experimenting with settings once you know the basics of how it works.

  1. Open Gephi and create a new project.
  2. Now go to File, Import Spreadsheet, and navigate to where your .csv file is. Tip - sometimes Gephi can’t open my downloads folder where the sheet is, so I move the file to desktop.
  3. Choose Edges Table, and click next.
  4. In the next dialogue box, tick the URL rating box and choose Float.
  5. Untick Page URL and URL rating (desc).
Import Internal Link Data

You'll get something that looks like this:

First Linkmap

It looks quite cool as it is, like some kind of nightmare spiderweb, but we can't get a lot of useful insights from it.

What we have now is a scrambled mess of nodes and edges.

Node = a small circle that represents a page on your site.

Edge = an arrow that represents an internal link (they look like lines until you zoom in).

Make sure you're in the Overview tab, and in the toolbar on the left, choose a layout from the drop-down menu. Fruchtman Reingold is a good one for this.

Click run, and it will give you something more like this:

second-linkmap

Now it's getting beginning to take shape, but there are a few more things we need to do to make it a visualization we can use.

In the appearance section of the Overview tab, click Nodes, then click the three circles on the top right of the toolbar, choose Ranking, then In-Degree from the drop down menu.

Each node (a page on your site) is sized according to how many links are pointing to it.

Now click the Edges tab, and choose the color palette on the right. Click Ranking and choose URL rating from the drop-down menu, then choose a color scheme you like. I've just used default.

Now your nodes are colored based on the URL ratings we got from Ahrefs. The higher the URL rating, the darker the node!

edit linkmap

Now go to the Data Laboratory tab click Copy data to other column in the box at the bottom of the screen, and select ID, then copy to Label in the popup box. This gives all your nodes the label of their respective URL.

copy link data

Finally, go to the Preview tab, select Show Labels and click refresh.

I've kept the labels hidden for this example, but it should look something these lines:

finished linkmap

aPPLY ALGORITHMS

What we've got now is pretty cool, but we're not done yet.

All this is pointless unless it can gain some valuable insights about the site structure and, more importantly, how to improve it.

From the image above, we can see that the larger nodes have more incoming internal links, and the darker colored nodes have more incoming external links (from the Ahrefs URL rating), while the smaller, lighter ones have fewer internal and external links.

You can zoom in to see the labels of each page and assess whether the internal linking is good for that particular page.

As we know, we can influence the flow of (external) link juice throughout the site by linking pages with a lot of backlinks to those that we want to rank. So on our link map, we can make a note of darker colored nodes and create internal links from those to pages with smaller nodes (that have few incoming internal links).

If you have important pages that are small and light colored, you might want to add some internal links from larger, darker nodes.

All this so far is quite easy just by checking the URL Ratings in Ahrefs.

So what's the point in all this?

How can we make this whole exercise worthwhile?

Well, the one of the great things about Gephi is that it's not an "SEO tool", so it can do a load of interesting things that SEO tool creators haven't even thought about.

Gephi is an "open-source network analysis and visualization software package" - Wikipedia

This is good, because our internal link map is indeed a network of interlinked pages that we want to analyse.

Let's have a look at some of the tools Gephi has that we can use to analyse our link map (at this point I'd like to clarify that I'm by no means an expert on statistical analysis, just an SEO with an interest in learning new ways to do stuff).

On the right hand side of the Gephi interface you have a Network overview menu that contains some useful tools we can use.

Gephi network overview

5. Network diameter

Network diameter calculates three measurements of distances between nodes (our pages).

Betweenness centrality - How often a node appears on the shortest path between nodes.

Closeness centrality - The average distance from a starting node to all other nodes.

Eccentricity - The distance from a given starting node to the farthest node from it.

Choose "Directed" (because each link on the website only goes in one direction) and click run. You'll see some graphs with the output but we can close that window.

We can, for example, resize the nodes based on betweenness centrality to see which pages are most frequently acting as bridges between any two pages, or by closeness centrality to see which pages are closest to all others in terms of clicks.

These pages can be especially strong for building (external, incoming) links to, as they will help to spread that [link juice/ authority/ PageRank distribution, whatever you want to call it] around the site

6. Internal page rank

Internal page rank (somewhat similar but not the same as the Google PageRank metric) is the likelihood of someone landing on a given page if randomly clicking links. The most linked to pages will tend to be larger in this visualization.

7. Modularity classes

Modularity classes are clusters of pages that are more connected to one another than they are to pages in other clusters (by their links to one another). Irrelevant pages clustered together can be the result of internal links not being thematically relevant.

Here's the full explanation - https://arxiv.org/pdf/0803.0476.pdf.

I like to apply an individual colour to each modularity class as a way of distinguishing modularity classes as it makes it easy to see which pages are statistically linked, and therefore whether we would want to improve internal linking in a more cluster based way (if that's your thing).

aNALYsE THE rESULTS

8. Export your internal link data and dig deeper

This is where we'll leave Gephi and return to our spreadsheets.

We can export all the data and sort by points one and two above, as well as Ahrefs' URL Rating, which will make it simple to determine the strongest and weakest pages in terms of internal linking.

Depending on the total number of pages and the data you see (the range of numbers for each), you can choose, say, the top and bottom 25, 50, 100, or 200 pages for each measurement.

You can see:

Strongest and weakest pages by betweenness centrality

Strong pages are the pages with high betweenness centrality act as a bridge between other pages more frequently than others do. These can be used to help people find otherwise underlinked pages.

Weak pages act as a bridge less often. If there are pages here with strong backlink profiles then they are being underutilized and should have more outgoing links placed on them.

Strongest and weakest pages by Internal page rank

Placing links from stronger pages to weaker pages can help users find different areas of the website more easily. It can also help reduce the necessary resources for bots to crawl the site.

Weak pages are the pages that are more difficult to find by clicking links in content.

Strongest and weakest pages by URL Rating

URL rating is based on strength of a page’s backlink profile as measured by Ahrefs. Pages with a high URL rating are more likely to rank highly, and are powerful in helping other pages rank higher when linking internally.

Weak pages have lower URL rating due to fewer external backlinks. These pages are less likely to rank on their own for competitive search terms, but can be boosted via internal linking from stronger pages.

9. Combine the data and act on the results

So rather than having three sets of strong and weak pages, we can get some particularly actionable results if we combine them and see which page are most often strong or weak as determined by the previous analysis.

In a new sheet or tab, copy all the strong pages from each of the three analyses, and do the same with the weak pages.

Now find an empty column and click row and type into row two the following (make sure your URLs are in column A):

=countif(A:A, A2)

Copy that down the entire column and sort by Z-A.

This counts how many times each URL appears in the list (i.e. from the three sets of strong pages).

Now copy all the data and paste as plain text, then select column A and remove the duplicates. In Google Sheets that's done by clicking Data > Data Cleanup > Remove Duplicates.

Now we have what can be considered an overall strength (based on the algorithms we applied, but you can explore more in Gephi).

Overall strength is calculated by combining the top pages for each of the metrics (page rank, betweenness centrality, and UR) and counting how many times each appears. Pages identified using the lists of strong pages represent the strongest pages across the site and should be used for outgoing internal links.

The same calculation can be done for the inverse strength of page rank and betweenness centrality (although sometimes UR isn't as useful for this).

These pages would be the first I’d create internal links to, depending on their business value.

PUBLISH and share

10. How to publish the internal link map to your website

This is only useful if you need to upload your map to your site in order to share it with your team/ client/ readers.

The way I like to do it might not be the best way, as I'm no developer, but it's simple and it works.

Gephi doesn't have any immediately website compatible outputs, but you can go to Tools, Plugins, and install a plugin called Sigma Exporter.

Now you'll have the option to export as Sigma.js template.

It will save a folder called Network. Inside are the files and folders you'll need to upload to your site.

Did you find this useful?

If you've get anything to add or want to give some feedback, let me know in the comments section below.

12 Comments

  1. wissiyou on 2nd November 2019 at 1:26 pm

    thanks for info. That’s crazy ScreamingFrog has not resolve to do this yet. Diagram is a great addition but without any information about link juice this is quite useless. thanks for your time

    • Joe Robinson on 6th February 2020 at 2:59 pm

      Thanks for your comment. The purpose of this is to show you how to create an internal link map of your website so you can find underlinked pages or underutilised resources. I may update the page in future with guides to some of the statistical analysis I perform.

      • Lina on 14th November 2021 at 1:53 pm

        Wow, you did an amazing job! but IMO it’s almost impossible to see the underlinked pages in this way… especially for multi-page website.. There is a great alternative at Jetoctopus. They made an Interlinking Structure efficiency report, that helps to see which pages need to be better linked, just in one glance. And the data is based on log files, which give straight answers to which exact amount of internal links on a page are most attractive to Bots.

        • Joe Robinson on 14th January 2022 at 3:02 pm

          Hey Alina, thanks for your comment – if you check the analysis section you’ll see how the data can be used to find underlinked pages.

    • Websst on 12th April 2020 at 4:10 pm

      Thank you for this publication and the use of the visual mapping.Strangely still there is no single tool for SEO that can do same at once in a single step.

  2. Tara Fitness on 13th January 2020 at 3:58 am

    Wow, who knew it was quite so difficult to create an internal linking map of your site. Thanks so much for the detailed how-to guide. I’ll be following it step-by-step as I need to reassess my linking strategy.

  3. Sophie on 5th June 2020 at 6:40 am

    Wow – this is super complex. I’m about halfway through and my head is spinning:)

    Just wondering, a lot of the values in my URL rating column created from the Vlookup function say #N/A. Is this going to cause issues in later steps? I don’t want to continue on if I need to remove those lines first.

    • Pravinraj on 18th September 2020 at 5:37 am

      You can add excel formula to make it zero instead

  4. Kiruba on 9th July 2020 at 10:47 pm

    This is very helpful. Thank you very much!

  5. Tessah on 26th July 2020 at 11:46 pm

    Thank you for taking the time to write this Joe. This is what I need to audit my websites.

  6. Dustin Montgomery on 8th October 2020 at 1:08 pm

    This is fantastic, thank you so much for sharing this guide.

  7. Phil Ohren on 4th August 2021 at 4:49 am

    Excellent link map visulisation! Love it. Very important for processing intent & content data.

Leave a Comment





This site uses Akismet to reduce spam. Learn how your comment data is processed.