How to hide (noindex) a PDF in WordPress from search engines

Many of our clients offer whitepaper PDFs on their site to generate leads. Therefore, our clients don’t want people to find their whitepaper PDFs from a Google search. Rather, they want to get people’s email address first before giving access to their whitepaper.

The easiest way to hide a PDF uploaded to WordPress from search engines, or to noindex it, is to do the following:

  1. Install and activate the Yoast WordPress SEO plugin
  2.  Upload the PDF to the media library
  3. Edit the PDF in the media library. Depending on how your media library looks (tile view or list view) here’s how to find the Edit link:
    In grid view, click on the PDF and then click Edit More Details:
    In list view, click Edit on the PDF:
  4. In the Yoast SEO settings for the media item, click the gear icon. Set the “Meta robots index” to noindex. This will make sure the file (not just the media attachment page) is not indexed by search engines. Ideally, you should modify this setting when you upload a new PDF. If the PDF already exists, it is probably already indexed in Google and might take some time for search engines to recrawl your site to noindex it.

Update: our client is using a plugin (WP Original Media Path) that uploads all media to a https://subdomain static.domain.com so we couldn’t use Yoast’s plugin which is only set to work on https://domain.com.

Therefore, I added a x-robots tag in the .htaccess file to hide the pdf:

Header set X-Robots-Tag “noindex, noarchive, nosnippet”

Why use x-robots tag instead of robots.txt:

The robots.txt does not prevent your page or file from being listed in search results.

What it does is stop the bot from crawling your page, but if a third party links to your PDF file from their website, your page will still be listed.

If you stop the bot from crawling your page using robots.txt, it will not have the chance to see the X-Robots-Tag: noindex response tag. Therefore, never disallow a page in robots.txt if you employ the X-Robots-Tag header.

I then used the Web Developer add on in Firefox to check the header response for this line:

x-robots-tag: noindex, noarchive, nosnippet

X-Firefox-Spdy: h2
accept-ranges: bytes
content-length: 1147940
content-type: application/pdf
date: Wed, 31 Jan 2018 14:20:39 GMT
etag: "118424-55f7ff5fc6f00"
host-header: 192fc2e7e50945beb8231a492d6a8024
last-modified: Mon, 04 Dec 2017 09:01:16 GMT
server: nginx
x-proxy-cache: HIT
x-robots-tag: noindex, noarchive, nosnippet

200 OK

 

Original post: How to hide (noindex) a PDF in WordPress from search engines

This article was first published at WPgarage.com 
Go to original post source
Email, print or share this page:

Related posts