Confessions of a Black Hat - Is Matt Cutts A Fraud? Panda 3 Crisis

About the Author

“joshHi I'm Josh Bachynski. You may remember me from such rants as SEO Hax0r and Google You Are Evil! I am an Ethicist and SEO with 14 years experience. I write about ethics and I do SEO consulting. Feel free to contact me at joshbachynski [at] gmail dot com.


The Complete Google Leaked PANDA Do & Don’t LIST - 2011 to Present

(C) Copyright Josh Bachynski, 2014

(Note: If you would rather listen to a video of this article, including more explanation as to how I got the Panda Factors, go here).


The Ethics Of Panda

Google sets the Panda SEO rules according to their subjective standards which they do not outright publish other than a list of vague, unhelpful, questions.

Back in March, Matt Cutts and I had a discussion about the ethics of panda. My argument was it was immoral for Google to obfuscate the details of a quality algorithm that:

  1. so clearly disenfranchises thousands of sites without warning according to nothing other than Google’s subjective opinion as to what they find “spammy” (a thinly veiled euphemism devised to punish sites that, we have to consider, must include, sites that simply do not fit into Google’s revenue model). But also:
  2. there should be no danger (or so I thought, see below) in simply telling us what exactly is low quality or high quality in their eyes. The irony is that most webmasters do want to have a high quality site. And we have no choice but to rank highly in Google.

So we would happily comply, if they would only publish the rules to follow. And not, instead, leave us with a list of vague questions, written by Matt Cutts (he admitted to me) which, for reasons not so hard to infer, he put Amit Singhal’s name on instead.

They have failed to do so. Preferring a totalitarian culture of misinformation.

If they are not going to tell us, we have no choice but to do it ourselves.

The List: High Quality “Do” Factors, Low Quality “Don’t” Factors

The following is a complete compilation of all SEO Panda leaks drawn directly from various Google sources that tells us, fairly clearly, what they subjectively consider to be evidence of High Quality web page factors, versus Low Quality web page factors.

Sites would be wise to make sure they have all of the high quality factors on their indexed pages (non-indexed pages do not count towards Panda – JM, Sep 13, 2013. Nor does Adwords or non-Google traffic: JM Sep 23, 2013 -- also JM leaked Jun 6, 2014 and and Mar 28 2014 that panda does crawl pages looking for onpage factors to give it a 'quality score', not to mention the ‘Processing web pages based on content quality’ patent (see Terence Mace)), and none of the low quality factors, if they want to avoid (or fix) any Panda issues.*

References:

JM = John Mueller, Google Webmaster Trends Analyst
MC = Matt Cutts, Head of Webspam (on indefinite leave at time of writing)
QRG = Google’s Quality Rater’s Guideline (2012, to 2014 – totally rewritten)
MO = Maile Ohye, Google Programs Tech Lead
Zin = Zineb Ait Bahajji, Google Webmaster Trends Analyst
PED = Pedro Dias, Ex-Google Employee
PF = Pierre Far, Google Webmaster Trends Analyst
Wyz = Michael Wyszomierski, Google Product Quality Team
PAT = Various Google Patents (as reported to me by Terence Mace)
GI = Gary Illyes, Google Engineer


HIGH QUALITY FACTORS (in no particular order):

  1. Good usage metrics showing User Satisfaction with your content / presentation (Although outright denying using “analytics bounce rate”, JM has mentioned numerous times (inc. JM Dec 2, Zin&JM Dec 20, 2103. Also MO SMX West, Mar 11-13, 2014. Wyz, Jul 3, 2014. NEW: GI, SMX East, 2014) user satisfaction is DIRECTLY important and keeps implying it is directly tracked. Some anecdotal experiments have shown this as well, seroundtable “changed prices experiment”, Jul 2013. PAT support as well. Make sure the majority of visitors are completing the desired task on pages.)

  2. Positive Social Shares / Mentions (JM, Dec 2, 2013. QRG, Mar 2014.)

  3. Positive “Reviews” on an Independent Google Verifiable Source (JM Feb 24, 2014)

  4. Authoritative Outlinks in Your Content / Citing Your Sources (JM June 20, 2014. QRG Mar 2014.)

  5. .com, .net, and .org a quality/trust factor (MC, Sep 11, 2013)

  6. Address and /or Contact Clearly Listed on Each page (QRG, Mar 2014)

  7. Robust About Us info Inc. Mission Statement, Company Directory and other onsite signs of legitimate business (QRG, Mar 2014)

  8. Robust Contact and/or Customer Service Information (QRG, Mar 2014)

  9. A Very Positive Reputation On Blogs and Forums, etc. (QRG, Mar 2014)
    where exactly?

    1. News articles, Wikipedia articles, blog posts, magazine articles, forum discussions, and ratings from independent organizations can all be sources of reputation information… Yelp, Better Business Bureau... Amazon, and Google Product Search.”

  10. Topical Experts Reference Your Site (On Web and Social) (QRG, Mar 2014) — So it is not just about the important topical PAGES that reference you positively, but also the important, topical, people who do, and the amount who do

  11. Date Info on Every Page - “Last Updated”– kept current, including copyright (QRG, Mar 2014)

  12. Clear Difference in Design Between Main Content And Supplemental Content (QRG, Mar 2014)

  13. Long standing Domain name, Long Standing Public Domain Registration (QRG Mar 2014. EDIT: JM denied this being a ranking factor “as far as he knows”, Aug 25, 2014 - he may not know every little detail)


Rampant Speculation About Other Possible High Quality Factors:
Test the Shopping Cart For Efficiency and Ease of Use (QRG, Mar 2014) (not hard to imagine the algo trying your shopping cart to see if it loads)
Web Forms Need Auto-Complete (MC, SMX Advanced, 2014) (also could be bad signal not to have it, or just part of user metrics. JM confirms that neither are panda factors "as far as [he] knows", Oct 6, 2014.)



LOW QUALITY FACTORS (in no particular order):

  1. Bad usage metrics showing possible User Dissatisfaction with your content / presentation (including speed, UI, whitespace (or lack thereof), too many options, bad/thin/poorly written content, didn’t answer their problem / question fast or good enough, etc.) (JM, Dec 2,20, 2013. Feb 14, 24, Jun 2, 2014. MO, SMX West, Mar 11-13, 2014. Wyz, Jul 3, 2014. PAT support as well. NEW: GI, SMX East, 2014) – ignore or deny this at your own risk

  2. EDIT: Duplicate or Aggregate, “Tag”, or “Category” Content (MC, Sep 11, 2013. Zin, Dec 20, 2013. JM, Nov 4, 2013, Jun 2, Oct 10, 2014 re. Panda 4. QRG, Mar 2014). NOTE: The issue is not duplicate content per se, the issue is thin content that "users might notice", and keyword stuffing of the "tag" or "category" pages. And offsite dup content.

  3. Duplicate Titles and Meta Description (MC Nov 18, 2013)

  4. Aggressive “search phrase” keyword use onsite, INCLUDING: URL string, page content, AND HTML code like TITLE or ALT attributes (JM, Dec 2, 2013, Jun 6, Aug 11 2014. MC on Mar 13, 2014, SMX West. PAT re: URLs with "generic" words in them, e.g., "bestherbalpills.com")

  5. EDIT: Keyword Stuffed Internal Links on Blog pages etc. (NOTE: this is both a problem with the number of keywords, and the ratio of links to text on the page -- Google has admitted both are signals for a "low quality" page in their eyes (JM, Dec 2, 2013, Aug 25, 2014. PAT)

  6. Off-topic / Multi-Topic Links or Content Onsite (Aug 12 & Nov 18, 2013, Apr 7, 2014 all JM hangout) — eg.: can’t have a site on finance talk about cooking recipes in their blog

  7. Clone sites are a strong panda factor (JM, Mar 10, 2014) — Don’t forget Google’s canonicalization algo will auto-301 sufficiently identical sites to a single site whether you want them to or not, SER, Feb 25, 2014.

  8. Old or Outdated or Mistaken Facts/Info (QRG Mar 2014. JM Feb 14, 2014. New Scientist, Aug 22, 2014.)

  9. Garbage text, single sentence pages, spun text, bad construction / spelling / grammar, Bad Search Results Pages, errors on page, etc. (QRG Mar 2014, MC, SMX Advanced, 2014, PED Mar 10, 2014. JM, Jun 6,20 2014)

  10. Made for Ads site – where users quickly click on your 50%+ ads above the fold (QRG Mar 2014. JM Feb 24, Mar 24, 2014)

  11. EDIT: NONUSEFUL 404, or excessive 500 level, or any PHP / MySQL errors (JM Feb 24, Oct 10, 2014. QRG Mar 2014. JM once again denied that excessive 404s are an issue, other than eating up your crawl budget, and not passing any link juice, and again, if your users notice -- this is not good)

  12. Main Content (purpose of page) is below the fold (JM, May 26, 2014. QRG Mar 2014)

  13. Excessive, Unmarked, Deceptive, Interstitial, In-between the Text Ads and Popups (QRG Mar 2014)

  14. Bad Reputation on Independent Sites (like BBB, Wiki, Scamreport, etc.) (QRG Mar 2014. Also “Bad Merchant” algo, Mar 9, 2013. SEL)

  15. Blocking Googlebot from Onsite CSS or JS Important for Design (JM June 2,16 2014. MO SMX Advanced, June 2014)

  16. Supplemental / Sidebar Content Useless or Distracting (QRG Mar 2014)

  17. Low Quality / Spammy / Duplicated / Auto-Generated User Generated Comments / Posts (QRG, Mar 2014. JM June 2,16 2014) — possible negative SEO exploit

  18. EMD held to higher standards (Dec 2, 2013, JM. PAT re: URLs with "generic" words in them)

  19. Longer EMDs are spam factor (MC, SMX West, Mar 13, Private Convo, 2014. PAT re: URLs with "generic" words in them)

  20. Slow speed a demoting factor - especially over 20 seconds to download, OR if creates ANY user dissatisfaction (check other browsers and mobile devices too) (MC, SMX Advanced, June 2013)

  21. Any Ecommerce or Health or Legal Related sites etc. (YMYL) held to higher standards to protect Users (QRG, Mar 2014) — This can include ANY site that advises on health, or financial matters, or sells to people including any service (law, plumbing, etc.) or any purchase (real estate, ecommerce, etc.)

  22. Reading level is too low (for YMYL) – jilted, simple, obvious text (QRG, Mar 2014) example from QRG: “Pandas eat bamboo. Pandas eat a lot of bamboo. It’s the best food for a Panda bear.”

  23. Broken Links / Images Don’t Load / Site is Not Maintained / Pages appear abandoned (QRG, Mar 2014)

  24. ANY Affiliate or “monetized links” or “sneaky redirects” to affiliate sites (that are not nofollowed? or cloaked) (JM&PF Nov 18, 2013. QRG, Mar 2014)

  25. Doorway pages (e.g. /law-chicago, /law-newyork, etc.) (JM, Jul 15, 2014)


Rampant Speculation on Other Bad Factors (inc. Google Patents -- NEW!):

Thanks to Terence Mace for bringing these Google patent leaks to my attention to add to the list! I list them here because some of them we have no additional Google mention they are specifically panda related, but definitely a good idea to watch out for regardless. You can get more information on Terrence's great work on Google Patents here

  • Any other Google penalty or demotion, especially Penguin. (JM has both implied this (Oct. 11, 2013. Jan 31, 2014. There is some PAT support for this as well) but also outright denied any other Google demotion or penalty being a bad panda factor (numerous times) – at any rate, you’ll want to clean this up anyways, if they ever let you. Might just want to start again. Google’s new policy is not very forgiving and doesn’t want to be – MC, themoralconcept.net. JM, Aug 11, 2014))
  • Having the same IP or DNS of known advertising network or content farms *** You can check this in BING by searching for your IP with "ip:" search operator - example
  • Having a domain name that is a misspelling of a genuine site
  • URLs containing generic text (e.g., "bestherbalpills.com" -- SIMILAR to other Panda admission - see above)
  • The inclusion of certain text strings on a page; the examples given are “domain is for sale”, “buy this domain”, and “this page is parked” but there are likely others. (WARNING: potential Negative SEO exploit - watch your user generated content)
  • The proportion of various types of content on the page expressed as a ratio and compared to other known high quality pages. -- The specific example of this technique given is ‘a web page providing 99% hyperlinks and 1% plain text is more likely to be a low-quality web page than a web page providing 50% hyperlinks and 50% plain text’ but there are likely a number of different content types that could be examined in this manner. (WARNING: watch this on thin ecommerce product pages, and obviously, site links pages -- these should likely be NOINDEX, FOLLOW)

The above could very well be scored in the Panda algo as well, or other quality algos


NOT a Good OR Bad Factor:
-- author snippet (JM, Dec 2, 2013. ALGO NOW DEPRECATED)
-- malware detected (JM, Apr 25, 2014)
-- Https (minor boost – runs constantly and separate from Panda. Page based) (JM, Aug 11, 2014. Aug 25)


Final Thoughts...

Do we know for sure any of these factors continue to be used in Google’s Quality Algorithms positively or negatively?

Nope.

NOR DOES THIS MATTER.

All we can go off of is the info that we have. This is what Google has said about Panda... And so, SEO now is simply a process of RISK MITIGATION. You can only go off the evidence that you have. All of these factors will MAKE YOUR SITE BETTER. And there are direct Google admissions they use it in their algos.

So mitigate your risk. Or one day you just don’t rank.

I dare Google to contradict any of these factors, or publish a more accurate Do and Don’t List themselves.

Commence comment war below ;-)

comments powered by Disqus