Protecting PDF files From Being Indexed In Search Engines

Posted on April 7, 2008 by admin.
Categories: Internet - General.

I have noticed a growing trend in companies who sell PDF files online. I went to get a copy of the ZEND PHP 5 Certification Study Guide, and I noticed that by doing a google search for pdf files with php 5 study guide that the pdf that they sell is actually being indexed by google.

The reason that this type of thing happens is that the actual pdf file itself resides within web space for that site, and someone out there has a link pointing to it.  Google follows the link to the file and indexes the file.

To prevent this from happening, you should store the files you want to restrict outside of web space (IE a folder ABOVE your web root) and allow PHP (or ASP or whatever) to serve the file.

FilePath could be something like “C:\Web\PDF\FileName.pdf” Where your webroot is “C:\Web\wwwroot” for example. The important thing is that the file resides outside of the web space.

In PHP you can do something like this:

<?php
if logged_in() {
header(’Content-type: application/pdf’);
print file_get_contents(”FilePath”);
} else {
# forward to login code here
}
?>

As always, If anyone has a better solution or a different one, please post a comment.

del.icio.us Slashdot Digg Facebook Technorati StumbleUpon Yahoo Ask

no comments yet.

Difficulty With Naming A Website

Posted on by admin.
Categories: Internet - General.

One of my favorite dev sites is trying to come up with a new site for a project they are working on.  You can find the article here on codinghorror.com. This does bring up an important issue: It can be really difficult to name a website.  I chose this site because it was suggested to be as an alternate name by godaddy.

I wonder how many people agonize for days over what domain to choose? There are so many brands out there, I think you have to be careful not to infringe on something someone else has already done, and more importantly you don’t want a URL that is too closely associated to something that you don’t want to be associated with.  I believe that codinghorror.com is doing a good thing in asking their readers to help pick the domain.

I think some important things to consider are:

1. Keywords in the domain

2. Branding Potential (think t-shirts, business cards, etc, depending on the type of site)

3. The professionalism of the Domain (if this is an ecommerce site, or a personal blog, it should make a difference on what domains are appropriate)

4.  Past uses, existing links, etc. (if the site was recently expired, it could have links pointing to it, problems with G, or other things you need to investigate).

5.  Is the domain easy to remember?

del.icio.us Slashdot Digg Facebook Technorati StumbleUpon Yahoo Ask

no comments yet.