Link Attraction Factors

Similar documents
The Personal. The Media Insight Project

CSE 190 Assignment 2. Phat Huynh A Nicholas Gibson A

ABC7Chicago.com: Rogers Park fire family files suit

THE AUTHORITY REPORT. How Audiences Find Articles, by Topic. How does the audience referral network change according to article topic?

Tech Me Out: Taking Strategic Communication from Page to Screen

Welcome Toby Miller My Articles My Profile Submit Article

Justice Performed: Courtroom TV Shows And The Theaters Of Popular Law By Sarah Kozinn READ ONLINE

NATIONAL CITY & REGIONAL MAGAZINE AWARDS

DRAFT For Release 8:30 a.m. EDT August 23, 2012

IBS College Media Awards

(1) (2) Dep Var: ln(1+ # en) ln(1+ # en) PP Max Votes? 0.284*** 0.284*** (0.064) (0.064) Population (m) 0.661*** 0.662*** (0.1466) (0.

A Major Zogby International Release Zogby/Lear Center Poll: Meet the Purples! Decoding the DNA of the swing voter

ICANN New gtld Auction Schedule dated 30 June 2014

Ushio: Analyzing News Media and Public Trends in Twitter

Audience. Profile. June 2017

Planet Hollywood, Las Vegas Jan

CSE 190 Professor Julian McAuley Assignment 2: Reddit Data. Forrest Merrill, A Marvin Chau, A William Werner, A

HITTING A MOVING TARGET. Sway, Inc Swayonline.com

Vol 12 No: 1, January President s message. A vulnerability has been found in the computer chips that are used by both PCs and Macs.

MONITORING REPORT ON THE WEBSITE OF THE STATISTICAL SERVICE OF CYPRUS NOVEMBER The report is issued by the.

TOTAL NATIONAL POST NETWORK 12,315,080. Report for September 2012 DIGITAL EDITION (See Notes #1)

ICANN New gtld Auction Schedule dated 6 May 2014

TORQUE GAME BUILDER "PLATFORMER KIT" FRAMEWORK WITH SOURCE CODE END USER LICENSE AGREEMENT (EULA)

February SW Wilsonville Road Wilsonville, OR 97070

TOTAL NATIONAL POST NETWORK 13,980,756. CONSOLIDATED MEDIA REPORT Newspaper. Report for September 2013

A new network for a new audience. Jay Adelson, Chairman, CEO David Prager, COO, VP Programming

42 SOME CLOUDS, Santa Rosa, CA Home Delivery Newspaper Ads News Wire Place/Change Ad

Reddit Advertising: A Beginner s Guide To The Self-Serve Platform. Written by JD Prater Sr. Account Manager and Head of Paid Social

Unit #2: Political Beliefs/Political Behaviors AP US Government & Politics Mr. Coia

Product Description

arxiv:cs/ v1 [cs.hc] 7 Dec 2006

ICANN New gtld Auction Schedule dated 11 August 2014

Analysis of Social Voting Patterns on Digg

How to value life? EPA devalues its estima

The Game 102.9FM/750AM Target Audience + Stats

st ANNUAL PRESS CLUB OF NEW ORLEANS EXCELLENCE IN JOURNALISM AWARDS COMPETITION

Simple Acts Toolkit for Universities

Herald-Tribune.com Ticket Inside Real Estate HTPreps Mug Shots IbisEye. Register Forums Log in. News Web Search by YAHOO!

Hoboken Public Schools. Physical Education Curriculum Grades 7 & 8

3 Distinctive Sponsorship Opportunities

Unit #2: Political Beliefs/Political Behaviors AP US Government & Politics Mr. Coia

B&A ENGINEERING SYSTEMS, INC. Employment Application

BASED ON ALL TABLET OWNERS AND THOSE WHO HAVE TABLETS IN HH [N=2806]:

LODI MEMORIAL LIBRARY One Memorial Drive Lodi, NJ On the web at LODI.BCCLS.ORG

Tabatha Yelós. Hi!

Buyer s Guide: AddThis Auto Segments. Learn more about our top auto segments, and which may work best to achieve your marketing goals.

Sport And The Law: An Australian Perspective By G. M. Kelly

Community Newsletter

The Lighthouses Of North Carolina For Kids [Kindle Edition] By Jonathan Madden

Candidate Evaluation. Candidate Evaluation. Name: Name:

Lettering: Make Your Own Cards, Signs, Gifts And More (Kids Can Do It) [Paperback] By Amanda Lewis

The Lord s Day Act. being. Chapter L-34 of The Revised Statutes of Saskatchewan, 1978 (effective February 26, 1979).

Week of September 1, 2001

Digital Contests Journalist of the Year Awards Quick Turns

RALLY USER STORY GUIDE

BOROUGH OF PITMAN COUNCIL MEETING MINUTES OCTOBER 9, 2018

X.Org Development Discussion Continues. Related Topics: Related Articles. Daylife Publishers Log In. Blog Developers Publishers

Arts Books DVD Gaming Home Lifestyle Movies Music News People Science Sport Tech TV Games Topics Blogs

Editorial Position Descriptions

Sun Mon Tue Wed Thu Fri Sat. Keep up with the Mobile Family Success Center on Social Media:

South Caledon Soccer Club Constitution

Spirit Week & House Decorating

The Electoral Process

PRINT LG: (75,000 + circ.) Journalists are eligible whose work had significant reach into Ohio during Entrants need not be SPJ members.

Social Computing in Blogosphere

;nnilr ~~s._ VOi~h'\A.

The Game 102.9FM/750AM Target Audience + Stats

OPEN SOURCE CRYPTOCURRENCY

How to Bookmark for Free Web Traffic

An introduction to our advertising options. Spotted by Locals, October Spotted by Locals - Experience cities like a local

Never Run Out of Ideas: 7 Content Creation Strategies for Your Blog

President Reagan ran as a conservative alternative to President Carter. Reagan, a former actor, had previously served as the governor of California.

Bizz Inc. Contest Rules NO PURCHASE IS NECESSARY TO ENTER OR WIN. A PURCHASE DOES NOT

Candidate Evaluation. Candidate Evaluation. Name: Name:

Writer Career Guide. The Sims 4 Writer Career Guide Copyright Sims Society All Rights Reserved 1. January 2015 Update1

University of Southern Mississippi M-Club Sports Hall of Fame Selection and Induction Process

2016 Uniform Crime Reporting for CAPCOG

It Would Be Game Changing to: Deliver him socially agreed upon and expert endorsed information all in one place.

Inviscid TotalABA Help

SANTANDER COMMUNITY QUARTERBACK CONTEST OFFICIAL RULES

th ANNUAL PRESS CLUB OF NEW ORLEANS EXCELLENCE IN JOURNALISM AWARDS COMPETITION

Judicial Council Monthly Court Activity Reports

Candidate Evaluation STEP BY STEP

Brain Games Kids PS Write-and-Erase Activity Cards

Hey, there, (Name) here! Alright, so if you wouldn t mind just filling out this short

Recommendations For Reddit Users Avideh Taalimanesh and Mohammad Aleagha Stanford University, December 2012

You gotta LISTEN to talk! MEDIA KIT

BY-LAW NUMBER THE REGIONAL MUNICIPALITY OF WATERLOO

And year after year, from the first class of their freshman year on to their recent graduation day, the four students rose to the occasion.

MATAWAN ABERDEEN PUBLIC LIBRARY

SEPTEMBER 2018 NEWSLETTER

Bylaws for ARITH, the IEEE Symposium on Computer Arithmetic

Important Dates


ONLINE SEGMENTS DATA DICTIONARY

Lab could add jobs, millions to state

FULL ADDRESS Street # & Name Apt. # City State Zip. Please print clearly.

ifest Coming Up Next Week

North Liberty Communications Advisory Commission

Bird Quiz Questions And Answers 2013 Uk Tv

Transcription:

Link Attraction Factors A study of the factors that influence the number of links a URL published to Digg s homepage accumulates. By Dan Zarrella http://danzarrella.com 2008

Introduction & Dataset One of the most valuable aspects of being listed on Digg s homepage is that the story is seen by a large number of highly savvy social media users (including a large number of bloggers) who are likely to link to pages they find interesting on other sites that they participate in or own. Studies have been done on the various characteristics of Digg stories and how they correlate to a story s popularity. We know that stories submitted by popular users are much more likely to go popular, and we know which times are the best for submitting stories. What hasn t been studied, however, is what happens to a story once it makes Digg s homepage; beyond the number of votes the story got, how viral did it go? By measuring the number of incoming links pointing to popular stories on Digg, I ve discovered which characteristics increase or decrease the number of links a story gets. Using Digg s API, I constructed a database of information on 33,322 of the 39,000 stories that became popular and were listed on Digg s homepage. I also indexed the textual content of the page for each story using scripts, CSS and HTML removal functions, as well as the number of incoming links the URL listed in the Digg story has. I used Yahoo s Site Explorer API to gather this data over the course of several weeks. The addition of the page content took the size of the database from about 16mb to over 500mb, so I also created an optimized table without the text content to use for calculations that did not require it. The distribution graph of the number of incoming links the stories accumulate, has a very long tail to the right, at least partially due to the fact that some of the URLs seen on Digg s homepage were popular root domains (like apple.com). I calculated the outer fence to identify statistically extreme outliers with the standard 3*IQR+Q3 formula, resulting in an upper boundary of 1717 links. In my calculations I only use pages that have less than 1717 links, leaving 30,676 eligible stories, 92.06% of the total database. Link Attraction Factors by Dan Zarrella - Page 2

Stories removed are shown in red on this distribution graph: With-Outliers Link Popularity Distribution Number of Stories 0 1000 2000 3000 4030 5170 6530 8410 13500 286000 145000 Number of Links Link Attraction Factors by Dan Zarrella - Page 3

After removing the outliers the clean dataset s distribution looks like this: Non-Outlier Link Popularity Distribution Number of Stories 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 Number of Links Link Attraction Factors by Dan Zarrella - Page 4

Results With this data set I first set out to test the quantative values I had for each story by calculating their correlation with the number of incoming links each story has. There are 5 numeric values I could test this way: the number of diggs a story has; the number of comments; the popularity (measured by the number of popular stories submitted by the user) of the submitter; and the lengths of the title and description. In all five cases correlation was too low to be statistically significant (values above 0.5 are generally considered significant correlations). While the number of votes or comments or the popularity of the submitter may influence a story s chances of going popular, they have no measurable effect on the number of links a story gets once it is listed on Digg s homepage. Critera Correlation Number of Diggs 0.22649 Number of Comments 0.313194 Popularity of Submitter 0.02973 Length of Title -0.00248 Length of Description -0.00035 To test the effect of other, non-numerical factors, I first calculated the average number of incoming links the URLs in my database have: 299. I then found the average number of links a story matching certain criteria has and compared that number to the overall average. To visualize the effect these factors have on a URL s link accumulation I calculated and graphed the difference from the average for each criteria as a percentage. For instance, if a certain test shows that a type of story gets 598 links on average that is a 10 difference from the overall average. Stories that match this one get 10 more links on average than a normal story. If a factor causes stories to only have 150 links on average, that criteria has a -5 difference from the norm. Link Attraction Factors by Dan Zarrella - Page 5

Average Links by Container 3 2 1-1 -2-3 -4 Technology World & Business Science Lifestyle Offbeat Entertainment Gaming Sports The first criteria I tested was the container the story was listed in on Digg. The above graph shows reasonable expectable results, with stories in the Technology container accumulating 23.66% more links than the average and stories listed under Sports receiving 32.89% less. One surprise here is that URLs listed in the Gaming container (typically thought of as very Digg-like ) got 20.39% less links than normal pages. Link Attraction Factors by Dan Zarrella - Page 6

Average Links by Topic 12 10 8 6 4 2 Travel & Places Programming Arts & Culture Playable Web Games Software Design Autos Mods Microsoft Apple Security Odd Stuff PC Games Linux/Unix Gadgets Tech Industry News Health Political Opinion Political News Music General Sciences Nintendo Scocer Hardware World News US Elections 2008 Next I tested the topic the stories were listed in, since there are so many of them I had to break the graph down into those that seemed to have a positive influence on the number of incoming links a URL received and those than seemed to have a negative influence. Similarly unsurprising results here, with technical and geek -oriented topics (technology, engineering) dominating the high-link-popularity side of the graph. Travel & Places took first place however, and stories listed in that category received 107.75% more links than average. Link Attraction Factors by Dan Zarrella - Page 7

Average Links by Topic Business & Finance Celebruty Xbox Food & Drink Space Motorsport Environment Playstation Pets & Animals Television Movies Comics & Animation Educational American Football Extreme Other Sports Gaming Industry News Basketball Hockey People Baseball Comedy Golf Tennis -1-2 -3-4 -5-6 -7-8 -9 On the low-link-popularity side of the graph, stories listed in sports and entertainment topics are seen to garner fewer links than the rest of Digg s topics. Tennis performs the worst, with stories listed there getting 76.25% less links, followed closely by Golf at 57.63% less. Link Attraction Factors by Dan Zarrella - Page 8

Average Links by Topic Topic Link Attraction Factors by Dan Zarrella - Page 9 Average Links Travel & Places 621.18 Programming 581.82 Art & Culture 568.82 Playable Web Games 501.58 Software 487.47 Design 453.40 Autos 439.58 Mods 396.02 Microsoft 390.19 Apple 365.88 Security 361.64 Odd Stuff 352.04 PC Games 338.39 Linux/Unix 336.64 Tech Industry News 336.33 Health 329.36 Political Opinion 328.97 Political News 325.62 Music 317.25 General Sciences 312.42 Nintendo 309.30 Soccer 308.43 Hardware 306.05 World News 302.52 Topic Average Links US Elections 299.61 Business & Finances 297.15 Celebrity 295.44 Xbox 294.61 Foot & Drink 291.00 Space 282.97 Motorsport 276.16 Environment 264.50 Playstation 262.01 Pets & Animals 247.82 Television 236.99 Movies 213.82 Comics & Animation 208.35 Educational 207.19 American Football 205.75 Extreme 195.20 Other Sports 191.03 Gaming Industry News 189.76 Basketball 184.74 Hockey 174.71 People 171.32 Baseball 171.25 Comedy 167.82 Golf 126.69 Tennis 71.00

Average Links by Day of Week 4% 3% 2% 1% -1% -2% -3% -4% -5% -6% Sun Mon Tue Wed Thu Fri Sat Submit Promote I then tested the day of the week the stories were submitted to Digg and promoted to its homepage on. Here we have a clear pattern showing that stories submitted and promoted during the business week (and especially in the beginning of it) tend to get more links than those submitted or promoted on weekends. The differences from the average based on day of week are small compared to criteria like topic or container, but the case for weekday submission and promotion is made. Link Attraction Factors by Dan Zarrella - Page 10

Average Links by Hour (PST) 2 15% 1 5% -5% -1-15% 12:00 AM 1:00 AM 2:00 AM 3:00 AM 4:00 AM 5:00 AM 6:00 AM 7:00 AM 8:00 AM 9:00 AM 10:00 AM 11:00 AM 12:00 PM 1:00 PM 2:00 PM 3:00 PM 4:00 PM 5:00 PM 6:00 PM 7:00 PM 8:00 PM 9:00 PM 10:00 PM 11:00PM Submit Promote After looking at days of the week I looked at hour of the day the stories were submitted and promoted. Again, we see a clear pattern of business-hour (especially early in the day) submissions and promotions getting more links than evening and night. However, the difference from the average here is larger than I found in day of the week. Stories that are submitted or promoted between the hours of 4am and 9am PST (9am and 1pm EST) get more links than those that are submitted or promoted outside of that period. Link Attraction Factors by Dan Zarrella - Page 11

Average Links by Keyword 7 6 5 4 3 2 1-1 pics iphone top 10 free ubuntu picture awesome vista howto list riaa mac apple most released linux breaking secret best sex revealed Title Description I then began testing if the occurrence of certain popular keywords in the title or description effected the average number of incoming links the stories received. Again, due to the number of words I tested I had to split the graph into positive-effect keywords and negative-effect keywords. In the top half we see keywords like how to, top 10, list, free and pics, as well as superlatives commonly used with top 10 and list-type stories. We also see that certain brand names can have a positive effect on the number of incoming links a story received. Link Attraction Factors by Dan Zarrella - Page 12

Average Links by Keyword 1 bush microsoft make ever iraq new xbox super ron paul report movie review interview video halo woman game guy digg wii nintendo -1-2 -3-4 -5 Title Description On the negative-effect side of the graph, we see one major surprise: stories that mention the word digg in their title received 27.3 less links and stories that contained it in the description field got 21.56% less links than the average story. Self-reference may help a story become popular and reach Digg s homepage, but it reduces the number of links it will get once it s there. Here we also see the words report, interview and review all have a negative impact on link acquisition. Stories containing these words contrast with positive-effect keywords like how to, top 10, list, free and pics in that the negative-effect words indicate longer, heavier and more textual content, while the positive ones indicate quick, well-chunked reads. Link Attraction Factors by Dan Zarrella - Page 13

8 Average Links by Keyword 6 4 2-2 -4-6 ubuntu linux how to iphone vista awesome picture mac ron paul secret apple riaa best released pics guy bush microsoft make top 10 most breaking list woman revealed ever free sex super report iraq interview game new movie digg review video wii halo nintendo xbox Title Description Page If you compare the data for keyword occurrence in the title and/or description to occurrence in the textual content on the story s target URL, we see that while title and description tend to have similar influence, the effect had by on-page occurrence of the words varies a good deal. Above is a combined keyword occurrence graph, sorted by the average links accumulated by URLs with the keyword in the content of the page. I ve also created two tools which use this keyword occurance data, one displays the effect a keyword has on stories using it in their title and description, and the other analyzes entire potential titles. Link Attraction Factors by Dan Zarrella - Page 14

Average Links by Keyword Average links when keyword occurs in... Keyword Desc Title Page ubuntu 372.51 382.31 384.86 linux 341.13 326.08 372.37 how to 372.03 366.85 358.51 iphone 410.59 415.45 354.06 vista 347.32 367.90 352.61 awesome 298.25 368.19 350.39 picture 347.84 369.60 345.54 mac 335.99 335.93 344.14 ron paul 243.08 271.50 344.03 secret 306.34 318.67 342.69 apple 344.50 335.63 340.82 riaa 363.28 341.75 339.62 best 338.83 317.11 335.38 released 286.18 327.65 334.56 pics 397.57 432.21 330.54 guy 237.88 229.27 330.26 bush 298.96 299.89 328.66 microsoft 297.69 298.30 327.65 make 296.73 298.15 327.55 top 10 490.02 402.85 326.36 Average links when keyword occurs in... Keyword Desc Title Page most 318.24 330.08 325.03 breaking 319.54 325.38 323.75 list 361.74 357.38 320.99 woman 292.11 251.40 320.99 revealed 281.98 304.90 320.84 ever 311.44 297.61 319.37 free 364.38 394.24 317.45 sex 279.49 311.34 315.29 super 265.14 280.60 314.72 report 281.42 271.45 314.33 iraq 296.52 290.24 309.37 interview 247.86 262.11 307.90 game 228.66 246.01 305.06 new 295.07 283.32 304.62 movie 244.69 269.82 303.49 digg 234.55 217.37 299.29 review 317.49 266.09 294.87 video 258.76 256.70 291.47 wii 193.65 196.20 281.07 halo 260.07 251.85 271.98 nintendo 187.16 179.95 264.18 xbox 248.90 281.77 262.96 Link Attraction Factors by Dan Zarrella - Page 15