Protecting PDF files From Being Indexed In Search Engines
I have noticed a growing trend in companies who sell PDF files online. I went to get a copy of the ZEND PHP 5 Certification Study Guide, and I noticed that by doing a google search for pdf files with php 5 study guide that the pdf that they sell is actually being indexed by google.
The reason that this type of thing happens is that the actual pdf file itself resides within web space for that site, and someone out there has a link pointing to it. Google follows the link to the file and indexes the file.
To prevent this from happening, you should store the files you want to restrict outside of web space (IE a folder ABOVE your web root) and allow PHP (or ASP or whatever) to serve the file.
FilePath could be something like “C:\Web\PDF\FileName.pdf” Where your webroot is “C:\Web\wwwroot” for example. The important thing is that the file resides outside of the web space.
In PHP you can do something like this:
<?php
if logged_in() {
header(’Content-type: application/pdf’);
print file_get_contents(”FilePath”);
} else {
# forward to login code here
}
?>
As always, If anyone has a better solution or a different one, please post a comment.