Atomic Media text

Atomic Media

Archive for the ‘seo news’ Category

« Older Entries | Newer Entries »

Google won’t index sites that do not work on mobile devices after July 5

Monday, June 3rd, 2024

We thought Google’s mobile-first indexing initiative, which started in 2016, was completed last October. But it won’t really be fully done until after July 5.

Mueller explained:

So if your site is not accessible using a mobile device, then Google “will no longer” index it and thus rank it.

Mobile accessibility is required for Google indexing. Yes, Mueller wrote, “If your site’s content is not accessible at all with a mobile device, it will no longer be indexable.”

This is a long time coming. Google has finally drawn a line in the sand for sites that simply do not render on mobile.

This doesn’t mean Google won’t index your site if it isn’t mobile-friendly. What Google is saying is that if your site simply does not render or load on mobile devices, then Google won’t index it.

If you have a desktop template only, it is fine, assuming the desktop version loads on a mobile device.

Some desktop crawling to continue. Google said that Google still sometimes uses the Googlebot Desktop crawler for product listings and for Google for Jobs. This means you may still see Googlebot Desktop in your server logs and reporting tools.

Why we care. For most of you, this probably won’t be an issue. But if someone hires you to do some SEO on their site and their site does not load on your Android phone or iPhone, then it may also not be crawled and indexed by Google after July 5. Your goal will be to ensure the site is accessible on mobile devices, and to test it using the Google Search Console URL Inspection tool, to ensure it is rendered.

Courtesy of Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing




How to extract GBP review insights to boost local SEO visibility

Monday, June 3rd, 2024

How to extract GBP review insights to boost local SEO visibility

Since the inception of local business listings, companies have explored various methods to acquire more customer reviews. These reviews provide valuable insights into consumer sentiment, common pain points, and areas for improvement.

While many businesses use paid tools to analyze review data, there are cost-effective methods to extract similar insights, particularly for smaller businesses with limited budgets.

This article:

Analyzing GBP reviews for business insights 

Companies like Yext, Reputation, and Birdeye can analyze top entities mentioned in reviews and offer insight into the sentiment around each of these. However, they can also command quite a large price tag. 

Investing in these tools is essential for businesses managing numerous listings across multiple platforms. However, extracting insights from competitor listings remains costly. Monitoring competitor listings for review insights is often seen as an unjustifiable expense.

Smaller businesses can manage listings cost-effectively by assigning an internal marketing employee, but extracting valuable insights from reviews without using tools is more challenging.

Dig deeper: How to turn your Google Business Profile into a revenue-generating channel

How to extract business insights from GBP listings

Luckily, there is a much more cost-effective method to collect entities from GBP reviews using Pleper’s API service. 

Collect place IDs for listing 

For small batches, using Google’s Place ID demo works well for collecting Place IDs for your business’ listings and local competitor listings.

I’ve found that the following formula works well for searching for these listings: {business name}, {business address}.

Google Place ID finder

For larger batches, I recommend using Google’s Place ID API. Using the above formula as the search query, Place IDs can be quickly and efficiently collected.

Use Pleper’s API to collect information on each listing

After each listing’s Place ID has been collected, use Pleper’s Scrape API to retrieve the listing information. Once the data has been retrieved, use a parsing script to extract review topics and assign a value to each topic based on sentiment.

Here is an example script that will do just that:

import pandas as pd

def extract_review_topics(data):
  topics_list = []

  sentiment_map = {
      'positive': 1,
      'neutral': 0,
      'negative': -1
  }

  for entry in data['results']['google/by-profile/information']:
      if 'results' in entry and 'review_topics' in entry['results']:
          for topic in entry['results']['review_topics']:
              topic_details = {
                  'Business Name': entry['results'].get('name', 'N/A'),
                  'Address': entry['results'].get('address', 'N/A'),
                  'Place ID': entry['payload']['profile_url'],
                  'Topic': topic.get('topic', 'N/A'),
                  'Count': topic.get('count', 0),
                  'Sentiment': sentiment_map.get(topic.get('sentiment', 'neutral'), 0)
              }
              topics_list.append(topic_details)

  return topics_list

topics_data = extract_review_topics(batch_result)
df = pd.DataFrame(topics_data)
print(df)

Now that the data has been properly retrieved from Pleper and parsed into a pandas dataframe, matplotlib can be used to create a word cloud like the one below:

Google reviews word cloud

Word clouds can be created for individual listings or aggregated data on all of a brand’s listings. Comparing word clouds from your own business listings to those of competitors can lead to truly valuable insights.

The impact of entities on local 3-pack results

Entities have been highlighted in reviews for some time now; however, I haven’t seen many SEOs attempt to promote revenue-generating entities in review solicitation. 

When possible, entities mentioned in a query are highlighted within the local 3-pack via the listings reviews. Typically, this occurs on long-tail queries, where more context is provided to Google on what the searcher is looking for.

To better understand the impact of entities within reviews, let’s analyze the results of two queries: 

Sporting stores near me

Sporting stores near me

Sporting store with baseball near me

Sporting store with baseball near me

Here are a few takeaways from comparing these two queries:

When comparing these results, I couldn’t help but believe that the mention of baseball in Pro Image Sports review increased their visibility within the 3-pack, so I investigated further.

Looking at the review topics provided by Google for Play it Again Sports, I noticed a high number of reviews for “golfing clubs,” so I changed the query to “Sporting Store with Golf Clubs Near Me.”

Sporting Store with Golf Clubs Near Me

By targeting a topic that is mentioned more frequently within Play It Again Sports’ reviews, they appeared within the local 3-pack.

From this small experiment, it’s clear that review topics (entities) play a role in local 3-pack visibility and a larger one than I once believed.

Dig deeper: How to establish your brand entity for SEO: A 5-step guide

Get the daily newsletter search marketers rely on.


See terms.


Promoting entities in review solicitation 

Google’s guidelines state that review solicitation should be honest, unbiased and without incentives. Businesses should also avoid review gating.

You can ask for reviews on specific topics and remind customers of the products or services they used. Implementation will vary based on each business’s approach to soliciting reviews.

Add a statement like “Tell us about your experience purchasing baseball equipment from us” above the review link in your solicitation email.

It’s important not to exaggerate in this message to avoid biased customer reviews. For example, avoid saying, “Tell us about your positive experience when purchasing high-quality baseball equipment from us.”

While this statement does not inherently create bias as it does not offer an incentive to leave a positive review, it can be considered manipulative, which does not fully align with Google’s guidelines for reviews to be honest and unbiased.

Dig deeper: Unleashing the potential of Google reviews for local SEO

Applying review insights for business results

After learning how to analyze listings (your own and competitors), how review topics (entities) influence local 3-pack visibility and how to increase the number of entities within reviews, it’s time to put it all together to drive business results.

Sharing insights from your business listing’s reviews with the appropriate internal stakeholders is key to helping inform strategic and operational changes. Competitor insights can be a driving force for these changes.

For example, if a competitor barber shop uses hot towels in each haircut and your business does not, this data may help make the case that your business should be doing the same.

Next, work internally to leverage business intelligence (i.e., customer purchase data) within review solicitation efforts to promote entities within reviews. Implementing these efforts will vary depending on a business’s technology stack and ability to integrate data.

A more simplistic approach may be necessary for businesses that lack the ability to integrate data. In these situations, I recommend appending a generic statement within the solicitation communication to identify a specific entity.

An example may be a mechanic shop that appends the following statement to increase the mentions of “mechanics.”

Regardless of the internal approach, reviews are crucial for local SEO and shaping consumer perceptions. As an SEO, you can help your business understand its strengths and weaknesses while working to improve local search visibility.

Courtesy of Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing




Brave launches Search Ads

Sunday, June 2nd, 2024

AI lion

Brave Search now offers Search Ads on a managed service basis.

Brave Search Ads. Here’s what Brave announced:

What Brave Search Ads look like. Here’s a screenshot:

Available now. Brave Search Ads are available in the U.S., Canada, the UK, France and Germany. Brands must meet certain eligibility requirements – Brave called this a “minimum threshold of eligible ad impressions in their desired region.”

Testing, testing. The launch follows 18 months of testing, Brave said. Amazon Ads Sponsored Products, Dell, Fubo, Insurify, Shutterstock and Thumbtack were among the brands that tested Brave Search Ads.

What Brave is saying. The company is pitching this as a way that brands can augment their paid media strategy and reach “highly qualified—but otherwise unreachable—audiences”:

About Brave Search. Brave has an independent search index (it doesn’t rely on Google or Microsoft to serve its search results). It now has 65 million monthly active users and processes over 10 billion annual searches. For context, Google processes roughly that many searches per day.

The announcement. Brave launches Search Ads in key markets, after successful test phase with leading brands.

Courtesy of Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing




LinkedIn shrinks link previews for organic posts

Saturday, June 1st, 2024

LinkedIn has reduced the size of link preview images for organic posts, while maintaining larger preview images for sponsored content.

Why it matters. The change aims to encourage more native posting on LinkedIn and could prompt more users and brands to pay for sponsored posts to retain larger link previews.

Why we care. The change incentivizes advertisers to pay to promote posts to get more visibility with larger link previews.

Driving the news. As part of this “feed simplification” update announced months ago, LinkedIn is making preview images significantly smaller for organic posts with third-party links.

What they’re saying. “When an organic post becomes a Sponsored Content ad, the small thumbnail preview image shown in the organic post is converted to an image with a minimum of 360 x 640 pixels and a maximum of 2430 x 4320 pixels,” per LinkedIn.

The other side. Some criticize the move as penalizing professionals who don’t have time for constant original posting and rely on sharing third-party content.

The bottom line. Brands and individuals looking for maximum engagement from shared links on LinkedIn may increasingly need to pay for sponsored posts.

Courtesy of Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing




Google to honor new privacy laws and user opt-outs

Saturday, June 1st, 2024

Google is rolling out changes to comply with new state privacy laws and user opt-out preferences across its ads and analytics products in 2024.

The big picture. Google is adjusting its practices as Florida, Texas, Oregon, Montana and Colorado enact new data privacy regulations this year. 

Key updates include:

Restricted Data Processing (RDP) (which is when Google limits how it uses data to only show non-personalized ads) for new state laws:

Honoring global privacy control opt-outs:

Why we care. This change Google is implementing will help keep advertisers on the right side of the law when it comes to privacy. However, the changes could impact ad targeting efficiency and personalization capabilities as more users opt out.

What they’re saying. Here’s what Google told partners in an email:

Bottom line. Google is taking steps to help partners comply with tightening data privacy rules, even if it means limiting its own ad targeting.

Courtesy of Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing




How SEO moves forward with the Google Content Warehouse API leak

Friday, May 31st, 2024

In case you missed it, 2,569 internal documents related to internal services at Google leaked.

A search marketer named Erfan Amizi brought them to Rand Fishkin’s attention, and we analyzed them.

Pandemonium ensued.

As you might imagine, it’s been a crazy 48 hours for us all and I have completely failed at being on vacation.

Naturally, some portion of the SEO community has quickly fallen into the standard fear, uncertainty and doubt spiral.

Reconciling new information can be difficult and our cognitive biases can stand in the way.

It’s valuable to discuss this further and offer clarification so we can use what we’ve learned more productively.

After all, these documents are the clearest look at how Google actually considers features of pages that we have had to date.

In this article, I want to attempt to be more explicitly clear, answer common questions, critiques, and concerns and highlight additional actionable findings.

Finally, I want to give you a glimpse into how we will be using this information to do cutting-edge work for our clients. The hope is that we can collectively come up with the best ways to update our best practices based on what we’ve learned.

Reactions to the leak: My thoughts on common criticisms

Let’s start by addressing what people have been saying in response to our findings. I’m not a subtweeter, so this is to all of y’all and I say this with love. ????

‘We already knew all that’

No, in large part, you did not.

Generally speaking, the SEO community has operated based on a series of best practices derived from research-minded people from the late 1990s and early 2000s.

For instance, we’ve held the page title in such high regard for so long because early search engines were not full-text and only indexed the page titles.

These practices have been reluctantly updated based on information from Google, SEO software companies, and insights from the community. There were numerous gaps that you filled with your own speculation and anecdotal evidence from your experiences.

If you’re more advanced, you capitalized on temporary edge cases and exploits, but you never knew exactly the depth of what Google considers when it computes its rankings.

You also did not know most of its named systems, so you would not have been able to interpret much of what you see in these documents. So, you searched these documents for the things that you do understand and you concluded that you know everything here.

That is the very definition of confirmation bias.

In reality, there are many features in these documents that none of us knew.

Just like the 2006 AOL search data leak and the Yandex leak, there will be value captured from these documents for years to come. Most importantly, you also just got actual confirmation that Google uses features that you might have suspected. There is value in that if only to act as proof when you are trying to get something implemented with your clients.

Finally, we now have a better sense of internal terminology. One way Google spokespeople evade explanation is through language ambiguity. We are now better armed to ask the right questions and stop living on the abstraction layer.

‘We should just focus on customers and not the leak’

Sure. As an early and continued proponent of market segmentation in SEO, I obviously think we should be focusing on our customers.

Yet we can’t deny that we live in a reality where most of the web has conformed to Google to drive traffic.

We operate in a channel that is considered a black box. Our customers ask us questions that we often respond to with “it depends.”

I’m of the mindset that there is value in having an atomic understanding of what we’re working with so we can explain what it depends on. That helps with building trust and getting buy-in to execute on the work that we do.

Mastering our channel is in service of our focus on our customers.

‘The leak isn’t real’

Skepticism in SEO is healthy. Ultimately, you can decide to believe whatever you want, but here’s the reality of the situation:

  1. Erfan had his Xoogler source authenticate the documentation. 
  2. Rand worked through his own authentication process. 
  3. I also authenticated the documentation separately through my own network and backchannel resources. 

I can say with absolute confidence that the leak is real and has been definitively verified in several ways including through insights from people with deeper access to Google’s systems. 

In addition to my own sources, Xoogler Fili Wiese offered his insight on X. Note that I’ve included his call out even though he vaguely sprinkled some doubt on my interpretations without offering any other information. But that’s a Xoogler for you, amiright? ????

Finally, the documentation references specific internal ranking systems that only Googlers know about. I touched on some of those systems and cross-referenced their functions with detail from a Google engineer’s resume.

Oh, and Google just verified it in a statement as I was putting my final edits on this. 

“This is a Nothingburger”

No doubt.

I’ll see you on page 2 of the SERPs while I’m having mine medium with cheese, mayo, ketchup and mustard.

“It doesn’t say CTR so it’s not being used”

So, let me get this straight, you think a marvel of modern technology that computes an array of data points across thousands of computers to generate and display results from tens of billions of pages in a quarter of a second that stores both clicks and impressions as features is incapable of performing basic division on the fly?

… OK.

“Be careful with drawing conclusions from this information”

I agree with this. We all have the potential to be wrong in our interpretation here due to the caveats that I highlighted.

To that end, we should take measured approaches in developing and testing hypotheses based on this data.

The conclusions I’ve drawn are based on my research into Google and precedents in Information Retrieval, but like I said it is entirely possible that my conclusions are not absolutely correct.

“The leak is to stop us from talking about AI Overviews”

No.

The misconfigured documentation deployment happened in March. There’s some evidence that this has been happening in other languages (sans comments) for two years.

The documents were discovered in May. Had someone discovered it sooner, it would have been shared sooner.

The timing of AI Overviews has nothing to do with it. Cut it out.

“We don’t know how old it is”

This is immaterial. Based on dates in the files, we know it’s at least newer than August 2023.

We know that commits to the repository happen regularly, presumably as a function of code being updated. We know that much of the docs have not changed in subsequent deployments. 

We also know that when this code was deployed, it featured exactly the 2,596 files we have been reviewing and many of those files were not previously in the repository. Unless whoever/whatever did the git push did so with out of date code, this was the latest version at the time.

The documentation has other markers of recency, like references to LLMs and generative features, which suggests that it is at least from the past year.

Either way it has more detail than we have ever gotten before and more than fresh enough for our consideration.

“This all isn’t related to search”

That is correct. I indicated as much in my previous article.

What I did not do was segment the modules into their respective service. I took the time to do that now.

Here’s a quick and dirty classification of the features broadly classified by service based on ModuleName:

Of the 14,000 features, roughly 8,000 are related to Search.

“It’s just a list of variables”

Sure.

It’s a list of variables with descriptions that gives you a sense of the level of granularity Google uses to understand and process the web.

If you care about ranking factors this documentation is Christmas, Hanukkah, Kwanzaa and Festivus.

“It’s a conspiracy! You buried [thing I’m interested in]”

Why would I bury something and then encourage people to go look at the documents themselves and write about their own findings?

Make it make sense.

“This won’t change anything about how I do SEO”

This is a choice and, perhaps, a function of me purposely not being prescriptive with how I presented the findings.

What we’ve learned should at least enhance your approach to SEO strategically in a few meaningful ways and can definitely change it tactically. I’ll discuss that below.

FAQs about the leaked docs

I’ve been asked a lot of questions in the past 48 hours so I think it’s valuable to memorialize the answers here.

What were the most interesting things you found?

It’s all very interesting to me, but here’s a finding that I did not include in the original article:

Google can specify a limit of results per content type.

In other words, they can specify only X number of blog posts or Y number of news articles can appear for a given SERP.

Having a sense of these diversity limits could help us decide which content formats to create when we are selecting keywords to target.

For instance, if we know that the limit is three for blog posts and we don’t think we can outrank any of them, then maybe a video is a more viable format for that keyword.

What should we take away from this leak?

Search has many layers of complexity. Even though we have a broader view into things we don’t know which elements of the ranking systems trigger or why.

We now have more clarity on the signals and their nuances.

What are the implications for local search?

Andrew Shotland is the authority on that. He and his group at LocalSEOGuide have begun to dig into things from that perspective.

What are the implications for YouTube Search?

I have not dug into that, but there are 23 modules with YouTube prefixes.

Someone should definitely do and interpretation of it.

How does this impact the (_______) space?

The simple answer is, it’s hard to know.

An idea that I want to continue to drill home is that Google’s scoring functions behave differently depending on your query and context. Given the evidence we see in how the SERPs function, there are different ranking systems that activate for different verticals.

To illustrate this point, the Framework for evaluating web search scoring functions patent shows that Google has the capability to run multiple scoring functions simultaneously and decide which result set to use once the data is returned.

While we have many of the features that Google is storing, we do not have enough information about the downstream processes to know exactly what will happen for any given space.

That said, there are some indicators of how Google accounts for some spaces like Travel.

The QualityTravelGoodSitesData module has features that identify and score travel sites, presumably to give them a Boost over non-official sites.

Do you really think Google is purposely torching small sites?

I don’t know.

I also don’t know exactly how smallPersonalSite is defined or used, but I do know that there is a lot of evidence of small sites losing most of their traffic and Google is sending less traffic to the long tail of the web.

That’s impacting the livelihood of small businesses. And their outcry seems to have fallen on deaf ears.

Signals like links and clicks inherently support big brands. Those sites naturally attract more links and users are more compelled to click on brands they recognize.

Big brands can also afford agencies like mine and more sophisticated tooling for content engineering so they demonstrate better relevance signals.

It’s a self-fulfilling prophecy and it becomes increasingly difficult for small sites to compete in organic search. 

If the sites in question would be considered “small personal sites” then Google should give them a fighting chance with a Boost that offsets the unfair advantage big brands have.

Do you think Googlers are bad people?

I don’t.

I think they generally are well-meaning folks that do the hard job of supporting many people based on a product that they have little influence over and is difficult to explain.

They also work in a public multinational organization with many constraints. The information disparity creates a power dynamic between them and the SEO community.

Googlers could, however, dramatically improve their reputations and credibility among marketers and journalists by saying “no comment” more often rather than providing misleading, patronizing or belittling responses like the one they made about this leak.

Although it’s worth noting that the PR respondent Davis Thompson has been doing comms for Search for just the last two months and I’m sure he is exhausted.

Is there anything related to AI Overviews?

I was not able to find anything directly related to SGE/AIO, but I have already presented a lot of clarity on how that works.

I did find a few policy features for LLMs. This suggests that Google determines what content can or cannot be used from the Knowledge Graph with LLMs.

Is there anything related to generative AI?

There is something related to video content. Based on the write-ups associated with the attributes, I suspect that they use LLMs to predict the topics of videos.

New discoveries from the leak

Some conversations I’ve had and observed over the past two days has helped me recontextualize my findings – and also dig for more things in the documentation.

Baby Panda is not HCU

Someone with knowledge of Google’s internal systems was able to answer that the Baby Panda references an older system and is not the Helpful Content Update.

I, however, stand by my hypothesis that HCU exhibits similar properties to Panda and it likely requires similar features to improve for recovery.

A worthwhile experiment would be trying to recover traffic to a site hit by HCU by systematically improving click signals and links to see if it works. If someone with a site that’s been struck wants to volunteer as tribute, I have a hypothesis that I’d like to test on how you can recover. 

The leaks technically go back two years

Derek Perkins and @SemanticEntity brought to my attention on Twitter that the leaks have been available across languages in Google’s client libraries for Java, Ruby, and PHP.

The difference with those is that there is very limited documentation in the code.

There is a content effort score maybe for generative AI content

Google is attempting to determine the amount of effort employed when creating content. Based on the definition, we don’t know if all content is scored this way by an LLM, or if it is just content that they suspect is built using generative AI.

Nevertheless, this is a measure you can improve through content engineering.

The significance of page updates is measured

The significance of a page update impacts how often a page is crawled and potentially indexed. Previously, you could simply change the dates on your page and it signaled freshness to Google, but this feature suggests that Google expects more significant updates to the page.

Pages are protected based on earlier links in Penguin

According to the description of this feature, Penguin had pages that were considered protected based on the history of their link profile.

This, combined with the link velocity signals, could explain why Google is adamant that negative SEO attacks with links are ineffective. 

Toxic backlinks are indeed a thing

We’ve heard that “toxic backlinks” are a concept that simply used to sell SEO software. Yet there is a badbacklinksPenalized feature associated with documents. 

There’s a blog copycat score

In the blog BlogPerDocData module there is a copycat score without a definition, but is tied to the docQualityScore.

My assumption is that it is a measure of duplication specifically for blog posts.

Mentions matter a lot

Although I haven’t come across anything suggesting that mentions are treated as links, there are lot of mentions of mentions as they relate to entities.

This simply reinforces that leaning into entity-driven strategies with your content is a worthwhile addition to your strategy.

Googlebot is more capable than we thought

Googlebot’s fetching mechanism is capable of more than just GET requests.

The documentation indicates that it can do POST, PUT, or PATCH requests as well.

The team previously discussed POST requests, but the other two HTTP verbs weren’t previously revealed. If you see some anomalous requests in your logs, this may be why.

Specific measures of ‘effort’ for UGC 

We’ve long believed that leveraging UGC is a scalable way to get more content onto pages and improve their relevance and freshness.

This ugcDiscussionEffortScore suggests that Google is measuring the quality of that content separately from the core content. 

When we work with UGC-driven marketplaces and discussion sites, we do a lot of content strategy work related to prompting users to say certain things. That, combined with heavy moderation of the content, should be fundamental to improving the visibility and performance of those sites.

Google detects how commercial a page is

We know that intent is a heavy component of Search, but we only have measures of this on the keyword side of the equation.

Google scores documents this way as well and this can be used to stop a page from being considered for a query with informational intent.

We’ve worked with clients who actively experimented with consolidating informational and transactional page content, with the goal of improving visibility for both types of terms. This worked to varying degrees, but it’s interesting to see the score effectively considered a binary based on this description. 

Cool things I’ve seen people do with the leaked docs

I’m pretty excited to see how the documentation is reverberating across the space. 

Natzir’s Google’s Ranking Features Modules Relations: Natzir builds a network graph visualization tool in Streamlit that shows the relationships between modules.

WordLift’s Google Leak Reporting Tool: Andrea Volpini built a Streamlit app that lets you ask custom questions about the documents to get a report. 

Direction on how to move forward in SEO

The power is in the crowd and the SEO community is a global team.

I don’t expect us to all agree on everything I’ve reviewed and discovered, but we are at our best when we build on our collective expertise.

Here are some things that I think are worth doing.

How to read the documents

If you haven’t had the chance to dig into the documentation on HexDocs or you’ve tried and don’t know here to start, worry not, I’ve got you covered. 

One thing that annoys me about HexDocs is how the left sidebar covers most of the names of the modules. This makes it difficult to know what you’re navigating to. 

If you don’t want to mess with the CSS, I’ve made a simple Chrome extension that you can install to make the sidebar bigger. 

How your approach to SEO should change strategically

Here are some strategic things that you should more seriously consider as part of your SEO efforts.

If you are already doing all these things, you were right, you do know everything, and I salute you. ????

SEO and UX need to work more closely together

With NavBoost, Google is valuing clicks as one of the most important features, but we need to understand what session success means.

A search that yields a click on a result where the user does not perform another search can be a success even if they did not spend a lot of time on the site. That can indicate that the user found what they were looking for.

Naturally, a search that yields a click and a user spends 5 minutes on a page before coming back to Google is also a success. We need to create more successful sessions.

SEO is about driving people to the page, UX is about getting them to do what you want on the page. We need to pay closer attention to how components are structured and surfaced to get people to the content that they are explicitly looking for and give them a reason to stay on the site.

It’s not enough to hide what I’m looking for after a story about your grandma’s history of making apple pies with hatchets (or whatever recipe sites are doing these days). Rather, it should be more about providing the exact information, clearly displaying it, and enticing the user to remain on the page with something additionally compelling.

Pay more attention to click metrics

We treat Search Analytics data as outcomes, but Google’s ranking systems treat them as diagnostic features.

If you rank highly and you have a ton of impressions and no clicks (aside from when SiteLinks throws the numbers off) you likely have a problem.

What we are definitively learning is that there is a threshold of expectation for performance based on position. When you fall below that threshold you can lose that position.

Content needs to be more focused

We’ve learned definitively that Google uses vector embeddings to determine how far off given a page is from the rest of what you talk about.

This indicates that it will be challenging to go far into upper funnel content successfully without a structured expansion or without authors who have demonstrated expertise in that subject area.

Encourage your authors to cultivate expertise in what they publish across the web and treat their bylines like the gold standard that it is.

SEO should always be experiment-driven

Due to the variability of the ranking systems, you cannot take best practices at face value for every space. You need to test, learn and build experimentation in every SEO program.

Large sites leveraging products like SEO split testing tool Searchpilot are already on the right track, but even small sites should test how they structure and position their content and metadata to encourage stronger click metrics.

In other words, we need to actively test the SERP, not just the site.

Pay attention to what happens after they leave your site

We now have verification that Google is using data from Chrome as part of the search experience. There is value in reviewing the clickstream data from SimilarWeb and Semrush.

Trends provide to see where people are going next and how you can give them that information without them leaving you.

Build keyword and content strategy around SERP format diversity

Google potentially limits the number of pages of certain content types ranking in the SERP, so checking the SERPs should become part of your keyword research.

Don’t align formats with keywords if there’s no reasonable possibility of ranking.

How your approach to SEO should change tactically

Tactically, here are some things you can consider doing differently. Shout out to Rand because a couple of these ideas are his.

Page titles can be as long as you want

We now have further evidence that the 60-70 character limit is a myth.

In my own experience we have experimented with appending more keyword-driven elements to the title and it has yielded more clicks because Google has more to choose from when it rewrites the title.

Use fewer authors on more content

Rather than using an array of freelance authors, you should work with fewer that are more focused on subject matter expertise and also write for other publications.

Focus on link relevance from sites with traffic

We’ve learned that link value is higher from pages that prioritized higher in the index. Pages that get more clicks are pages that are likely to appear in Google’s flash memory.

We’ve also learned that Google highly values relevance. We need to stop going after link volume and solely focus on relevance.

Default to originality instead of long form

We now know originality is measured in multiple ways and can yield a boost in performance.

Some queries simply don’t require a 5,000-word blog post (I know, I know). Focus on originality and layer more information in your updates as competitors begin to copy you.

Make sure all dates associated with a page are consistent

It’s common for dates in schema to be out of sync with dates on the page and dates in the XML sitemap. All of these need to be synced to ensure Google has the best understanding of how hold the content is.

As you refresh your decaying content, make sure every date is aligned so Google gets a consistent signal.

Use old domains with extreme care

If you’re looking to use an old domain, it’s not enough to buy it and slap your new content on its old URLs. You need to take a structured approach to updating the content to phase out what Google has in its long-term memory.

You may even want to avoid there being a transfer of ownership in registrars until you’ve systematically established the new content.

Make gold-standard documents

We now have evidence that quality raters are doing feature engineering for Google engineers to train their classifiers. You want to create content that quality raters would score as high quality so your content has a small influence over the next core update. 

Bottom line

It’s shortsighted to say nothing should change. Based on this information, I think it’s time for us to reconsider our best practices.

Let’s keep what works and dump what’s not valuable. Because, I tell you what, there’s no text-to-code ratio in these documents, but several of your SEO tools will tell you your site is falling apart because of it.

How Google can repair its relationship with the SEO community

A lot of people have asked me how can we repair our relationship with Google moving forward.

I would prefer that we get back to a more productive space to improve the web. After all, we are aligned in our goals of making search better.

I don’t know that I have a complete solution, but I think an apology and owning their role in misdirection would be a good start. I have a few other ideas that we should consider.

Everybody keep up the good work

I’ve seen some fantastic things come out of the SEO community in the past 48 hours.

I’m energized by the fervor with which everyone has consumed this material and offered their takes – even when I don’t agree with them. This type of discourse is healthy and what makes our industry special.

I encourage everyone to keep going. We’ve been training our whole careers for this moment.

Editor’s note: Join Mike King and Danny Goodwin at SMX Advanced for a late-breaking session exploring the leak and its implications. Learn more here.

Courtesy of Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing




Google explains how it is improving its AI Overviews

Friday, May 31st, 2024

Google responded to the bad press surrounding its recently rolled out AI Overviews in a new blog post by its new head of Search, Liz Reid. Google explained how AI Overviews work, where the weird AI Overviews came from, the improvements Google made and will continue to make to its AI Overviews.

However, Google said searchers “have higher satisfaction with their search results, and they’re asking longer, more complex questions that they know Google can now help with,” and basically, these AI Overviews are not going anywhere.

As good as featured snippets. Google said the AI Overviews are “highly effective” and based on its internal testing, the AI Overviews “accuracy rate for AI Overviews is on par with featured snippets.” Featured snippets also use AI, Google said numerous times.

No hallucinations. AI Overviews generally don’t hallucinate, Google’s Liz Reid wrote. The AI Overviews don’t “make things up in the ways that other LLM products might,” she added. AI Overviews typically only go wrong when Google “misinterpreting queries, misinterpreting a nuance of language on the web, or not having a lot of great information available,” she wrote.

Why the “odd results.” Google explained that it tested AI Overviews “extensively” before releasing it and was comfortable releasing it. But Google said that people tried to get the AI Overviews to return odd results. “We’ve also seen nonsensical new searches, seemingly aimed at producing erroneous results,” Google wrote.

Also, Google wrote that people faked a lot of the examples, but manipulating screenshots showing fake AI responses. “Those AI Overviews never appeared,” Google said.

Some odd examples did come up, and Google will make improvements in those types of cases. Google will not manually adjust AI Overviews but rather improve the models so they work across many more queries. “we don’t simply “fix” queries one by one, but we work on updates that can help broad sets of queries, including new ones that we haven’t seen yet,” Google wrote.

Google spoke about the “data voids,” which we covered numerous times here. The example, “How many rocks should I eat?” was a query no one has searched for prior and had no real good content on. Google explained, “However, in this case, there is satirical content on this topic … that also happened to be republished on a geological software provider’s website. So when someone put that question into Search, an AI Overview appeared that faithfully linked to one of the only websites that tackled the question.”

Improvements to AI Overviews. Google shared some of the improvements it has made to AI Overviews, explaining it will continue to make improvements going forward. Here is what Google said it has done so far:

Finally, “We’ll keep improving when and how we show AI Overviews and strengthening our protections, including for edge cases, and we’re very grateful for the ongoing feedback,” Liz Reid ended with.

Why we care. It sounds like AI Overviews are not going anywhere and Google will continue to show them to searchers and roll them out to more countries and users in the future. You can expect them to get better over time, as Google continues to hear feedback and improve its systems.

Until then, I am sure we will find more examples of inaccurate and sometimes humorous AI Overviews, similar to what we saw when featured snippets initially launched.

Courtesy of Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing




Google adds opt-in for Video Enhancements for ads

Friday, May 31st, 2024

Google is allowing more advertisers to opt into new “video enhancements” for video ad campaigns. These aim to improve performance by automatically creating additional video formats from advertisers’ original assets.

Why we care. With more video consumption happening on mobile devices, having ads properly formatted for different screen sizes and orientations is crucial. The new enhancements use AI to resize and reformat videos to be more effective.

How it works. Advertisers upload a standard horizontal video as they normally would.

Google’s AI automatically generates additional versions in different aspect ratios like square (1:1) and vertical (9:16, 4:5).

It intelligently crops and repositions the video to preserve key elements.

Shorter video clips are also automatically created by cutting down long videos to highlight key moments.

Image credit: Thomas Eccel

The benefits. According to Google, this:

How to enable/disable.

Courtesy of Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing




Unpacking Google’s massive search documentation leak

Friday, May 31st, 2024

Google search documentation leak

A massive Google Search internal ranking documentation leak has sent shockwaves through the SEO community. The leak, which exposed over 14,000 potential ranking features, provides an unprecedented look under the hood of Google’s closely guarded search rankings system.

A man named Erfan Azimi shared a Google API doc leak with SparkToro’s Rand Fishkin, who, in turn, brought in Michael King of iPullRank, to get his help in distributing this story.

The leaked files originated from a Google API document commit titled “yoshi-code-bot /elixer-google-api,”  which means this was not a hack or a whistle-blower.

SEOs typically occupy three camps:

I suspect many people will be changing their camp after this leak.

You can find all the files here, but you should know that over 14,000 possible ranking signals/features exist, and it’ll take you an entire day (or, in my case, night) to dig through everything.

I’ve read through the entire thing and distilled it into a 40-page PDF that I’m now converting into a summary for Search Engine Land.

While I provide my thoughts and opinions, I’m also sharing the names of the specific ranking features so you can search the database on your own. I encourage everyone to make their own conclusions.

Key points from Google Search document leak

Why is Google specifically filtering for personal blogs / small sites? Why did Google publicly say on many occasions that they don’t have a domain or site authority measurement?

Why did Google lie about their use of click data? Why does Google have seven types of PageRank?

I don’t have the answers to these questions, but they are mysteries the SEO community would love to understand.

Things that stand out: Favorite discoveries

Google has something called pageQuality (PQ). One of the most interesting parts of this measurement is that Google is using an LLM to estimate “effort” for article pages. This value sounds helpful for Google in determining whether a page can be replicated easily. 

Takeaway: Tools, images, videos, unique information and depth of information stand out as ways to score high on “effort” calculations. Coincidentally, these things have also been proven to satisfy users.

Topic borders and topic authority appear to be real

Topical authority is a concept based on Google’s patent research. If you’ve read the patents, you’ll see that many of the insights SEOs have gleaned from patents are supported by this leak.

In the algo leak, we see that siteFocusScore, siteRadius, siteEmbeddings and pageEmbeddings are used for ranking.

What are they?

Topic embeddings data moduleSource: Topic embeddings data module

Why is this interesting?

Remember when I said PageRank is deprecated? I believe nearest seed (NS) can apply in the realm of topical authority. 

NS focuses on a localized subset of the network around the seed nodes. Proximity and relevance are key focus areas. It can be personalized based on user interest, ensuring pages within a topic cluster are considered more relevant without using the broad web-wide PageRank formula.

Another way of approaching this is to apply NS and PQ (page quality) together. 

By using PQ scores as a mechanism for assisting the seed determination, you could improve the original PageRank algorithm further. 

On the opposite end, we could apply this to lowQuality (another score from the document). If a low-quality page links to other pages, then the low quality could taint the other pages by seed association. 

A seed isn’t necessarily a quality node. It could be a poor-quality node. 

When we apply site2Vec and the knowledge of siteEmbeddings, I think the theory holds water. 

If we extend this beyond a single website, I imagine variants of Panda could work in this way. All that Google needs to do is begin with a low-quality cluster and extrapolate pattern insights. 

What if NS could work together with OnsiteProminence (score value from the leak)?

In this scenario, nearest seed could identify how closely certain pages relate to high-traffic pages. 

Image quality

ImageQualityClickSignals indicates that image quality measured by click (usefulness, presentation, appealingness, engagingness). These signals are considered Search CPS Personal data.

No idea whether appealingness or engagingness are words – but it’s super interesting! 

Host NSR

I believe NSR is an acronym for Normalized Site Rank.

Host NSR is site rank computed for host-level (website) sitechunks. This value encodes nsr, site_pr and new_nsr. Important to note that nsr_data_proto seems to be the newest version of this but not much info can be found.

In essence, a sitechunk is taking chunks of your domain and you get site rank by measuring these chunks. This makes sense because we already know Google does this on a page-by-page, paragraph and topical basis. 

It almost seems like a chunking system designed to poll random quality metric scores rooted in aggregates. It’s kinda like a pop quiz (rough analogy).

NavBoost

I’ll discuss this more, but it is one of the ranking pieces most mentioned in the leak. NavBoost is a re-ranking based on click logs of user behavior. Google has denied this many times, but a recent court case forced them to reveal that they rely quite heavily on click data. 

The most interesting part (which should not come as a surprise) is that Chrome data is specifically used. I imagine this extends to Android devices as well.

This would be more interesting if we brought in the patent for the site quality score. Links have a ratio with clicks, and we see quite clearly in the leak docs that topics, links and clicks have a relationship. 

While I can’t make conclusions here, I know what Google has shared about the Panda algorithm and what the patents say. I also know that Panda, Baby Panda and Baby Panda V2 are mentioned in the leak. 

If I had to guess, I’d say that Google uses the referring domain and click ratio to determine score demotions. 

HostAge

Nothing about a website’s age is considered in ranking scores, but the hostAge is mentioned regarding a sandbox. The data is used in Twiddler to sandbox fresh spam during serving time. 

I consider this an interesting finding because many SEOs argue about the sandbox and many argue about the importance of domain age. 

As far as the leak is concerned, the sandbox is for spam and domain age doesn’t matter.

ScaledIndyRank. Independence rank. Nothing else is mentioned, and the ExptIndyRank3 is considered experimental. If I had to guess, this has something to do with information gain on a sitewide level (original content).

Note: It is important to remember that we don’t know to what extent Google uses these scoring factors. The majority of the algorithm is a secret. My thoughts are based on what I’m seeing in this leak and what I’ve read by studying three years of Google patents. 

How to remove Google’s memory of an old version of a document

This is perhaps a bit of conjecture, but the logic is sound. According to the leak, Google keeps a record of every version of a webpage. This means Google has an internal web archive of sorts (Google’s own version of the Wayback Machine). 

The nuance is that Google only uses the last 20 versions of a document. If you update a page, wait for a crawl and repeat the process 20 times, you will effectively push out certain versions of the page. 

This might be useful information, considering that the historical versions are associated with various weights and scores.

Remember that the documentation has two forms of update history: significant update and update. It is unclear whether significant updates are required for this sort of version memory tom-foolery.

Google Search ranking system

While it’s conjecture, one of the most interesting things I found was the term weight (literal size).

This would indicate that bolding your words or the size of the words, in general, has some sort of impact on document scores.

Google Search ranking system

Index storage mechanisms

Interestingly, the standard hard drive is used for irregularly updated content.

Get the daily newsletter search marketers rely on.


See terms.


Google’s indexer now has a name: Alexandria

Go figure. Google would name the largest index of information after the most famous library. Let’s hope the same fate does not befall Google.

Two other indexers are prevalent in the documentation: SegIndexer and TeraGoogle.

Alexandria

Did we just confirm seed sites or sitewide authority?

The section titled “GoogleApi.ContentWarehouse.V1.Model.QualityNsrNsrData” mentions a factor named isElectionAuthority. The leak says, “Bit to determine whether the site has the election authority signal.”

This is interesting because it might be what people refer to as “seed sites.” It could also be topical authorities or websites with a PageRank of 9/10 (Note: toolbarPageRank is referenced in the leak).

It’s important to note that nsrIsElectionAuthority (a slightly different factor) is considered deprecated, so who knows how we should interpret this.

This specific section is one of the most densely packed sections in the entire leak. 

Short content can rank

Suprise, suprise! Short content does not equal thin content. I’ve been trying to prove this with my cocktail recipe pages, and this leak confirms my suspicion.

Interestingly enough, short content has a different scoring system applied to it (not entirely unique but different to an extent). 

Fresh links seem to trump existing links

This one was a bit of a surprise, and I could be misunderstanding things here. According to freshdocs, a link value multiplier, links from newer webpages are better than links inserted into older content.

Obviously, we must still incorporate our knowledge of a high-value page (mentioned throughout this presentation).

Still, I had this one wrong in my mind. I figured the age would be a good thing, but in reality, it isn’t really the age that gives a niche edit value, it’s the traffic or internal links to the page (if you go the niche edit route).

This doesn’t mean niche edits are ineffective. It simply means that links from newer pages appear to get an unknown value multiplier.

Quality NsrNsrData

Here is a list of some scoring factors that stood out most from the NsrNsrData document.

NSR and Qstar

It seems like site authority and a host of NSR-related scores are all applied in Qstar. My best guess is that Qstar is the aggregate measurement of a website’s scores. It likely includes authority as just one of those aggregate values. 

Scoring in the absence of measurement

nsrdataFromFallbackPatternKey. If NSR data has not been computed for a chunk, then data comes from an average of other chunks from the website. Basically, you have chunks of your site that have values associated with them and these values are averaged and applied to the unknown document.

Google is making scores based on topics, internal links, referring domains, ratios, clicks and all sorts of other things. If normalized site rank hasn’t been computed for a chunk (Google used chunks of your website and pages for scoring purposes), the existing scores associated with other chunks will be averaged and applied to the unscored chunk. 

I don’t think you can optimize for this, but one thing has been made abundantly clear:

You need to really focus on consistent quality, or you’ll end up hurting your SEO scores across the board by lowering your score average or topicality.

Demotions to watch out for

Much of the content from the leak focused on demotions that Google uses. I find this as helpful (maybe even more helpful) as the positive scoring factors.

Demotions to watch out for

Key points:

It’s important to note that click satisfaction scores aren’t based on dwell time. If you continue searching for information NavBoost deems to be the same, you’ll get the scoring demotion.

A unique part of NavBoost is its role in bundling queries based on interpreted meaning. 

NavBoost based on links and user signals

Spam

Anchor text

How is no one talking about this one? An entire page dedicated to anchor text observation, measurement, calculation and assessment.

Anchor spam info data moduleSource: Anchor spam info data module

At the end of it all, you get spam probability and a spam penalty. 

Here’s a big spoonful of unfairness, and it doesn’t surprise any SEO veterans.

trustedTarget is a metric associated with spam anchors, and it says “True if this URL is on trusted source.” 

When you become “trusted” you can get away with more, and if you’ve investigated these “trusted sources,” you’ll see that they get away with quite a bit.

On a positive note, Google has a Trawler policy that essentially appends “spam” to known spammers, and most crawls auto-reject spammers’ IPs.

9 pieces of actionable advice to consider

The unified theory of ranking: Only using leaked factors

This is not a perfect depiction of Google’s algorithm, but it’s a fun attempt to consolidate the factors and express the leak into a mathematical formula (minus the precise weights). 

Only using leaked factors

Definitions and metrics

R: Overall ranking score

UIS (User Interaction Scores)

UIS (User Interaction Scores)

CQS (Content Quality Scores)

CQS (Content Quality Scores)

LS (Link Scores)

LS (Link Scores)

RB (Relevance Boost): Relevance boost based on query and content match

RB (Relevance Boost)

QB (Quality Boost): Boost based on overall content and site quality

QB (Quality Boost)

CSA (Content-Specific Adjustments): Adjustments based on specific content features on SERP and on page

CSA (Content-Specific Adjustments)

Full formula

R=((w1​⋅UgcScore+w2​⋅TitleMatchScore+w3​⋅ChromeInTotal+w4​⋅SiteImpressions+w5​⋅TopicImpressions+w6​⋅SiteClicks+w7​⋅TopicClicks)+(v1​⋅ImageQualityClickSignals+v2​⋅VideoScore+v3​⋅ShoppingScore+v4​⋅PageEmbedding+v5​⋅SiteEmbedding+v6​⋅SiteRadius+v7​⋅SiteFocus+v8​⋅TextConfidence+v9​⋅EffortScore)+(x1​⋅TrustedAnchors+x2​⋅SiteLinkIn+x3​⋅PageRank))×(TopicEmbedding+QnA+STS+SAS+EFTS+FS)+(y1​⋅CDS+y2​⋅SDS+y3​⋅EQSS)

Generalized scoring overview

Generalized Formula: [(User Interaction Scores + Content Quality Scores + Link Scores) x (Relevance Boost + Quality Boost) + X (content-specific score adjustments)] – (Demotion Score Aggregate)

Courtesy of Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing




PayPal launching ad network fueled by user purchase data

Thursday, May 30th, 2024

PayPal is building an advertising business that will leverage the troves of data it collects on consumer purchases and spending habits.

What’s happening? The digital payments giant plans to create an ad network that allows merchants and brands to target PayPal’s roughly 400 million users with personalized promotions and ads based on their transaction histories.

Why we care. Advertisers should be interested in this because PayPal has a vast amount of purchasing data from 400 million users, so this could mean sophisticated targeting and advertising across multiple channels (as Paypal plans on serving ads beyond its platforms) from one platform.

Key hires.

Details. PayPal already offers an “Advanced Offers” ad product that uses AI to serve PayPal users with targeted discounts from merchants whenever they make a purchase.

What they’re saying. PayPal says users can opt out of having their data included in the ad targeting.

Between the lines. The move follows other finance giants like JPMorgan Chase entering the retail media ad space by monetizing their customer data.

PayPal’s ad business is still nascent and may struggle to move the needle for the fintech company whose core payments processing business has higher profit margins.

The big picture. PayPal’s ad ambitions come as the company ais to rebound from recent struggles, including major layoffs and a stock slide after forecasting muted profit growth this year.

Courtesy of Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing




« Older Entries | Newer Entries »