Digg Goes to the Source to Avoid Duplicates

Digg’s CSS started to go a little crazy about 10 minutes ago, which is a good sign that we would start seeing some new features showing up… and Digg didn’t let us down.

The main aspect to this update was to introduce a new ‘Dupe Detection’ system that users have been screaming for. Like some of the recent changes Digg has rolled out, this seems to be another really useful change that will only make the submission process easier and cut down on duplicate submissions.

The changes begin as soon as you attempt to submit an article. Digg immediately goes to the source of your submission and scrapes the page, analyzing various aspects to determine if the content might be a duplicate.

If Digg suspects the content might be a duplicate, then they display the possible duplicate submissions for you to review, before you even begin the submission process.

digg supecheck Digg Goes to the Source to Avoid Duplicates picture

Not only will it help to save time submitting content that is only going to end up being a duplicate anyhow, this will also help to solve some of the various tricks users would try in order to fool to duplicate content system, such as adding additional parameters to the end of the submission url.

“To better understand the nature of the problem, we analyzed the types of duplicate stories being submitted. Most common are the same stories from the same site, but with different URLs. Our R&D team came up with a solution that identifies these types of duplicates by using a document similarity algorithm. Look for a separate tech blog post on how this works, but it has proven to be a reliable way of identifying identical content from the same source.” — Digg Blog

At this point you either have to abandon the submission or click the ‘My story has no duplicates’ button, which could potentially put your account in jeopardy if it is indeed a duplicate story. (Be warned that even before this system, users were warned about submitting duplicate content. Digg will ban accounts for submitting obvious duplicate content.)

“We’ll also be monitoring when certain Diggers choose to bypass high-confidence duplicates and will use this data to continue to improve the process going forward.” — Digg

Is the new system perfect?

No, as you can see in the picture below of a story that was release just a few minutes before its submission was attempted.

Story Submitted:

cnn story Digg Goes to the Source to Avoid Duplicates picture

Digg found the following potential dupes:

digg dupe wrong Digg Goes to the Source to Avoid Duplicates picture

However, at least you can make this determination at the beginning of the process instead of after you have filled out all the submission details.

If Digg does not detect any duplicate submissions after scanning the source, then you will not see this screen and will be shown Step 2 of the submission process.

One additional benefit to Digg scraping the source prior to being able to submit, is that they now automatically pull the title and a description (the first paragraph) as a recommendation.

digg submit autofill Digg Goes to the Source to Avoid Duplicates picture

In addition to the ‘Dupe Detection’ system being rolled out, Digg also made a few other minor changes.

Not sure how many people noticed, but Digg removed the ‘views’ count from the DiggBar. Originally it was only showing DiggBar views and since non-Digg users were being redirected, the data was inaccurate at best.

With the new updates today, the ‘views’ counter has returned, and should be showing all views to the content, not just the DiggBar views.

diggbar views Digg Goes to the Source to Avoid Duplicates picture

When looking at the various sections of Digg, you will notice a ‘more’ button that appears when you hover over an articles description. Clicking the ‘more’ button takes you to the article page where you can view the ongoing discussion in comments.

digg more Digg Goes to the Source to Avoid Duplicates picture

All in all, I think the addition of the new ‘Dupe Detection’ and the new submission process will help a lot in reducing duplication and the time it takes to submit your stories.

Edit: This post just hit the front page of Digg… Funny part is that it went to the front page at the exact same time as the Digg Blog post about… the same exact topic.

digg fp withdupe Digg Goes to the Source to Avoid Duplicates picture

I actually did not know the Digg blog post was out when I wrote this article, and did not know it was up until Jen emailed me the link.

Also both articles are quite different, Digg’s being about announcing the feature, and mine explaining what it does and walking through the process.

So I would hardly consider the example above to be a failure in the detection system.

Comments

7 Responses to “Digg Goes to the Source to Avoid Duplicates”

  1. Andrew Warner on June 30th, 2009 11:42 pm

    This is very helpful. You pointed out more details than any other site that covered these changes.

  2. Tyrone on July 1st, 2009 12:33 am

    This is a fascinating explanation of a new feature on Digg. Brent Csutoras, I salute you!

  3. fsdafsd on July 1st, 2009 5:19 am

    The "more" button is fucking annoying. I hope other people start complaining about it.

  4. Television Spy on July 1st, 2009 1:46 am

    good but raises questions about the bans and whether a user legitimately believes their article is unique compared to the other content.

    What about sites that quote an article in large part, wouldn't that be tracked and considered as very similar to the original source. For example say an innacurate article gets dugg, then an article on another site criticizing that article's inaccuracies gets dugg later on – the critical site could get flagged as being too similar – especially if it quotes parts of the original article.

  5. Lincoln Nguyen on July 2nd, 2009 9:11 pm

    Long time coming. They shouldve implemented this before the shout feature

  6. wangli123 on July 22nd, 2011 2:58 am

    thanks .nice post,thanks for this aticle!authentic football jerseys

  7. caseyhampton on July 25th, 2011 2:24 am

    i think the anthor of the articel is so called shakspere ,he mentioned the good website is worth a visiting .
    to be honest ,it's just like a magic of the music ,which shock me ,moving me ,let me get out of the traoublsome .

    wholesale nfl jerseys china

Leave a Reply