Monday, September 21, 2009

How DocMonk Prevents Search Engines from Caching Your Premium Content

One way that DocMonk increases ease of use for document recipients is by eliminating the need for users to log in to the site. Users need only click on the link they receive from the publisher and they immediately get to download their personalized PDF document. The main way that DocMonk protects publishers' content is by keeping the URL a secret between the publisher and subscriber.

Of course, just having a secret URL isn't very good security. One risk is search engines, which may find the page and then index the content or allow others to find it. DocMonk prevents search engines from getting to publishers' valuable content by using a robots.txt file, which directs search engine crawlers to go away. However, a robots.txt direction is kind of like a "No Right Turn" sign; less civically minded members of the community might choose to ignore it. To put some teeth into our "no search engine" policy, we use CAPTCHAs to make sure every visitor is a living, breathing human being.

We've all seen CAPTCHAs. They are those distorted series of characters that you must type when a website is trying to prevent spam. DocMonk uses reCAPTCHA to verify that the entity trying to download a protected document is a human being. By ensuring that only humans get through to the document, DocMonk prevents search engines from indexing and caching the documents. Once DocMonk determines that a particular user is really human, subsequent attempts to download documents usually do not require completing a new CAPTCHA challenge.

CAPTCHAs are one way that DocMonk strikes a balance between security and usability.

No comments:

Post a Comment