Google published an important research paper on determining page quality with AI. The details of the algorithm seem remarkably similar to what the helpful content algorithm is known to do.
Google Doesn’t Identify Algorithm Technologies
Contents
- 1 Google Doesn’t Identify Algorithm Technologies
- 2 The Helpful Content Signal
- 3 5. Is the Helpful Content Signal Multiple Things?
- 4 Text Generation Models Can Predict Page Quality
- 5 OpenAI GPT-2 Detector
- 6 AI Detects All Forms of Language Spam
- 7 Results Mirror Helpful Content Update
- 8 Three Language Quality Scores
- 9 The Algorithm is “Powerful”
- 10 Citations
- 11 What is helpful content according to Google?
- 12 What social media use algorithms?
- 13 How is Google helpful to students?
- 14 What happens when I publish my Google site?
No one outside of Google can say with certainty that this research paper is the foundation of a helpful content index.
Google generally does not distinguish the underlying technology of its various algorithms such as Penguin, Panda or SpamBrain algorithms.
So one cannot say with certainty that this algorithm is a helpful content algorithm, one can only guess and comment on it.
But it’s worth looking at because it’s an eye opener.
The Helpful Content Signal
1. It Improves a Classifier
Google has provided a few details about the helpful content icon but there is still a lot of speculation as to what it actually is.
The first details were in a tweet dated December 6, 2022 announcing the first news update.
“It improves our team & works everywhere in the world in all languages.”
A cluster, in machine learning, is something that separates data (is it this or that?).
2. It’s Not a Manual or Spam Action
The Helpful Content Algorithm, according to Google’s commenter (What content creators need to know about Google content updates in August 2022), is not a spam action or a manual action.
“This classification process is completely automated, using machine learning.
It is not a manual action or a spam action. ”
A content optimization commentator says that a helpful content algorithm is a signal used to rank.
“… it’s a new sign and one of the many signs that Google is testing for content.”
4. It Checks if Content is By People
What’s interesting is that the helpful content icon (apparently) checks whether the content is human-made.
Google’s blog post on Helpful Content (Other human content, for people Searching) stated that it is a brand to identify content created by people and by people.
“… we’re rolling out a series of improvements to Search to make it easier for people to find great content made by, and for, people.
…
The idea of news being “about people” is repeated three times in the announcement, apparently indicating that it is the value of a helpful news show.
And if it is not written “by people” then it is generated by a machine, which is an important issue because the algorithm in question here is related to the recognition of things generated by the machine.
5. Is the Helpful Content Signal Multiple Things?
Finally, Google’s blog post seems to indicate that Content Optimization isn’t just one thing, like one algorithm.
Danny Sullivan writes that it’s a “series of improvements” which, if I’m not reading too much into it, means that it’s not just one algorithm or system but several that together accomplish the task of removing objects. useless.
“…we’re rolling out a series of improvements to Search to make it easier for people to find great content made by, and for, people.”
Text Generation Models Can Predict Page Quality
What this research paper finds is that large-scale linguistic models (LLM) such as GPT-2 can accurately identify low-quality content.
They used groups trained to recognize machine-generated text and found that the same groups were able to distinguish low-quality text, even though they were not trained to do so.
Large language models can learn to do new things that they were not trained to do.
Stanford University’s article on GPT-3 mentions a system that independently learned the ability to translate text from English to French, simply because it was given a lot of data to learn from, something it didn’t. happened with GPT-2, which was trained below. data.
The article notes how adding more information causes new behaviors to emerge, the result of what is called unsupervised training.
Unsupervised training is when a machine learns to do something it was not trained to do.
The word “natural” is important because it refers to when a machine learns to do something it is not trained to do.
A Stanford University article on GPT-3 explains:
“Trainees said they were surprised that such behavior could emerge from balancing data and computing resources, and expressed interest in what capabilities might emerge from the alternative.”
Emerging new capabilities are exactly what the research paper describes. They found that machine-generated text can also predict low-quality content.
“Our work is twofold: first we show through human analysis that groups trained to distinguish between human and machine handwriting emerge as unsupervised predictors of ‘page quality’, able to recognize low quality content without any training.
This enables rapid bootstrapping of quality indicators at low resource levels.
Secondly, we are keen to understand the prevalence and structure of low-quality pages in the country, conducting an extensive qualitative and quantitative analysis of more than 500 million web articles, making it a study that the largest ever made on this subject. ”
The takeaway here is that they used a text input model trained to see machine generated content and found that a new behavior emerged, the ability to identify low quality pages.
OpenAI GPT-2 Detector
The researchers tested two systems to see how well they worked to find low-quality content.
One of the systems used is RoBERTa, which is a pre-training method that is an improved version of BERT.
These are the two systems tested:
They found that OpenAI’s GPT-2 detector was superior at detecting low-quality content.
The interpretation of the test results shows what we know about the auxiliary signal.
AI Detects All Forms of Language Spam
The research paper states that there are many indicators of quality but this approach only focuses on the quality of speech or language.
For the purposes of this algorithm research paper, the terms “page quality” and “language quality” mean the same thing.
The achievement of this research is that they successfully used the OpenAI GPT-2 indicator of whether something is machine generated or not as speech quality data.
“… documents with a high P score (typewritten) have a low linguistic quality.
…Machine authorship recognition can therefore be a powerful proxy for quality assessment.
It doesn’t require written examples – just a script to practice independently.
This is particularly important in applications where linear data are scarce or where the distribution is too complex to model properly.
For example, it is difficult to maintain a representative dataset that contains the names of all types of low-quality web content.
What this means is that the system does not have to be trained to recognize certain types of low-quality content.
It learns to find all the different types of low values on its own.
This is a powerful way to identify poor quality pages.
Results Mirror Helpful Content Update
They tested this system on half a billion web pages, analyzing the pages using various characteristics such as document length, content age and title.
The age of news is not about marking new news as low quality.
They analyzed web content over time and found that there was a huge jump in low-quality pages starting in 2019, coinciding with the growing popularity of machine-generated content.
Topic analysis revealed that some topics have high-quality pages, such as legal and government topics.
Interestingly, they discovered a large number of low-quality pages in the academic area, which they said corresponded to sites that provide student chat.
What makes that interesting is that education is a topic specifically mentioned by Google that will be affected by the Helpful Content update.
A Google blog post written by Danny Sullivan shares:
Three Language Quality Scores
“…our analysis found that it will significantly improve the results associated with online learning…”
Google’s Guidelines for Quality Raters (PDF) use four quality ratings, low, medium, high, and very high.
The researchers used three criteria for evaluating the new system, as well as one unspecified one.
Documents that were considered unspecified were those that could not be analyzed, for whatever reason, and were removed.
Items are given a score of 0, 1, and 2, with two being the highest.
These are the definitions of Language Quality Score (LQ):
“0: Low LQ.
The text is unclear or inconsistent.
1: Medium LQ.
Lowest Quality:
The text is understandable but poorly written (frequent grammatical/syntactical errors).
2: High LQ.
The text is clear and reasonably well written (rare grammatical/syntactical errors).
Here are the Quality Raters Guidelines’ definitions of low quality:
“MC was created without sufficient effort, originality, talent, or skill necessary to satisfactorily accomplish the purpose of the page.
…not paying attention to important details like clarity or organization.
…Some low-quality content is created with little effort in order to have supporting content.
making money instead of creating original or useful content to help users.
Filler content can also be added, especially at the top of the page, forcing users to scroll down to the MC.
…The writing of this article is unprofessional, including many grammatical and punctuation errors.”
The quality standards have a more detailed description of the low quality than the algorithm.
The Algorithm is “Powerful”
What is interesting is how the algorithm relies on grammatical and syntactical errors.
Syntax is a reference to the order of words.
Words in the wrong order sound wrong, similar to what Yoda’s character in Star Wars says (“It’s impossible to see the future”).
Does the Helpful Content algorithm depend on grammar and syntax? If this is an algorithm, it may play a role (but not the only role).
But I would like to think that the algorithm has been updated with some of the quality standards guidelines between the research publication in 2021 and the publication of the helpful information signal in 2022.
It’s a good way to read the results to get an idea if the algorithm is good enough to use in search results.
Many research papers end by saying that more research needs to be done or conclude that the improvements are worthless.
The most interesting papers are those that report new health outcomes.
The researchers say this algorithm is powerful and goes beyond the basics.
What makes it a good candidate for a helpful content type signal is that it is a low-resource algorithm that is web-scale.
Finally they guarantee good results:
“This paper suggests that detectors trained to distinguish human versus machine-written text are effective predictors of the linguistic quality of web pages, and outperform a supervised spam class. .”
The conclusion of the research paper was positive in terms of effectiveness and expressed hope that the research will be used by others.
Citations
Google Research Page:
No further research is required.
Download the Google Research Paper
This research paper describes the development of low-quality web page detection.
The decision shows that, in my opinion, there is a possibility that it can make it into Google’s algorithm.
What is helpful content according to Google?
Because it’s described as a “web-scale” algorithm that can be used in a “low-resource setting” it means that this is the kind of algorithm that can continue to work all the time, as the news site says. which are helpful. to do.
We don’t know if this is related to the improvement of helpful content but it is definitely an improvement in the technology to detect low quality content.
- Proposed Models are Unsupervised Predictors of Page Quality: A Fine-Scale Study
- Examples of Unsupervised Predictors of Page Quality: A Fine-Scale Study (PDF)
- Image courtesy of Shutterstock/Asier Romero
- Showing Omniscience It’s important to understand the meaning of “helpful information.” According to Google, helpful content is unique and original. Written by people with real knowledge of the subject. For example, let’s say you’re writing about a hotel guide in the Maldives.
- How do you write helpful content? 10 tips for writing great content
- Create a Powerful Headline. Say you get 100 people to visit your blog. …
- Hook Readers With an Interesting Introduction. …
- Write Your Audience. …
Narrow the Focus of Your Text. …
What does Google look for in content?
Be Cooperative. …
Write in Your Unique Brand Voice. …
What does Google look for in a good website?
Provide the Knowledge Readers Want. …
Use Schedule.
What is Google’s helpful content? A helpful content system aims to reward content where visitors feel they have had a satisfying experience, while content that does not meet the visitor’s expectations will not work either. The system generates a broad signal that we consider among many other signals for website pages.
What does Google see when it crawls my site?
The amount that the content satisfies the user’s query This is the most important factor to ensure that Google ranks your content at the top. They want content that provides unique solutions to their users’ queries. If you provide the right answer for users, Google will rank you higher.
How does Google decide which results to show? To provide you with the most valuable information, Search algorithms look at many data and indicators, including your query terms, relevance and page usage, resource expertise, your location and settings. The weight applied to each point varies depending on the nature of your question.
Be descriptive: Use appropriate, descriptive titles for your pages. We recommend placing different topics, products, or services on separate pages, one topic (or closely related items) on each page. Be thorough: Be specific about everything you offer. Google is smart, but we can’t guess what you don’t tell us.
How does Google evaluate content quality?
What makes a good web search? Make the Search Box Visible to Users Hide the search box in a logical place. Make sure the search bar is in the same place on all pages so users know exactly where to find it as they navigate through the site. Use microcopy as a text message to help the user log out.
What 3 things make a good website? New, Quality Items. Be clear, interesting and fresh. Use language that is understandable to your audience—avoid jargon, colloquial language, and abbreviations. Explain the âWhy.â Visitors have a short attention span: spell correctly, speak correctly, be relevant and you can update regularly.
Google looks at your txt file, Google looks at a lot of information on your site â page title. It is a meta tag, which is embedded in the HTML code of your page. The title tag is called the “single most important on-page SEO factor”, and the “most important on-page SEO factor”.
How does the Google crawler work? Most of our Search indexes are created through the work of software known as crawlers. These visit publicly accessible web pages and follow links on those pages, just as you would when browsing the web.
How do I know if Google is crawling my website? For a detailed analysis of how well your URL is showing up, search for the page’s URL in Google. The “Last crawl” date in the Page discovery section shows the date the page was used to generate this information and gas.
Overall website content quality So Google evaluates the quality of a publisher/author with E-A-T and relevance through old data acquisition methods (such as text analysis) as well as new machine learning methods ( such as Rankbrain).
What is Google’s top quality content? It includes images, layout, how everything is presented, page speed and other details, some of which are related to the user experience and how information is presented to the site visitor.
Types of Social Media Algorithms. Social media algorithms vary by platform. Therefore, you can destroy it with a social media icon. The major platforms are Facebook, Pinterest, LinkedIn, Twitter, and Instagram.
How will social media algorithms work in 2022? Social media algorithms work by using rules and data to determine what information should be shown to each social media user. The goal is to curate a highly engaging feed that will keep people active on the platform as much as possible.
Is YouTube’s algorithm AI?
What is Facebook’s algorithm? User Engagement: The User Affinity section of the algorithm in Facebook’s EdgeRank looks at the user’s affinity with the content (post/status).
What was the first social media algorithm? Facebook, the world leader in social media, was the first to introduce an algorithm by introducing EdgeRank back in 2009.
And, across all social media platforms and every social media post, an AI algorithm or machine learning system controls how the content you create and The ads you buy are placed in front of users—often in ways that are invisible to marketers. .
Do social media algorithms use artificial intelligence? AI technology is an integral part of the popular networks you use every day – and marketing on those platforms. Facebook uses advanced machine learning to do everything from helping you recognize your face in photos to targeting users with advertising.
How is Google helpful to students?
Is the Instagram algorithm AI? Artificial Intelligence (AI) technology is essential to our content analysis. AI can detect and remove content that violates our Community Guidelines before anyone reports it. Sometimes, our technology sends content to teams of people who carefully review and decide on it.
Like Netflix, YouTube uses AI to find the best “videos” for viewers (or at least the person whose account is currently logged in).
Is Google’s algorithm AI? The Google Algorithm is a complex web of AI-driven algorithms that determine how and where search results appear. No outsider has a complete understanding of how these algorithms work, but often try to learn the features of these algorithms in order to do better SEO.
Does the YouTube algorithm use machine learning? Machine learning (ML) and artificial intelligence (AI) are used on the YouTube platform; but despite their benefit of increased discovery and convenience, there are doubts that algorithms increase bias, spread misinformation, and create extreme behavior.
What happens when I publish my Google site?
The algorithm of each social media platform is different, but they are all based on machine learning and a set of data called ranking signals. This is exactly what they sound like: indicators used to set the value of each piece of content for each user.
Tools for collaboration, communication, and creativity Students can learn 21st century problem solving and skills they will use in their future careers, with accessibility features that help each student do their job you are beautiful. See how Google for Education tools and features can help every student learn.
How does Google help learning? Google for Education gives teachers the freedom to spend more time creating their own learning, and less time managing it. Students can learn 21st century problem solving and skills they will use in their future careers, with accessibility features that help each student do their best work.
Does publishing a Google site cost money?
How does Google affect students? What is the Google Effect? The Google effect, also known as digital amnesia, is the tendency to forget information that is easily found through search engines like Google. We do not keep this information in our memory because we know that this information is easily available on the internet.
What is the benefit of Google for students? Training credits. Get free access to professional platforms that provide you with step-by-step guidance and hands-on experience with cloud technologies. Access the Google Cloud library at Qwiklabs for hands-on hands-on labs. Students receive a total of 200 credits and each lab costs 0-15.
When you click “Publish” on the new Google Sites, you allow other people to view your site. If your organization allows you to publish websites, you will see options to (1) allow anyone on your website to visit your website, and (2) allow your website to appear in search results.
Is Google Sites free to publish?
How long does it take to publish a website on Google? Depending on how closely you follow the above, Google can find your website within the first four weeks of launch. Do a lot of work, however, and your website can appear on Google within four days. Having trouble getting your site to rank well? Check out Google’s Webmaster tools first.
What happens when you publish a site to Google? Once you publish your website, you now have two “versions” of your website, the Published and the Conversion Canvas. The public version is the one that everyone will be able to access once you share the website URL with them. The editing side is yours (or your colleagues’) to continue working.
Are Google Sites free? Yes! You can build a Google Site cheaply. Also, since it has no pricing, you get all its features for free.
How much does a Google site cost?
What happens when you publish a website to Google? When you click âPublishâ on the new Google Sites, you allow other people to view your site. If your organization allows you to publish websites, you will see options to (1) allow anyone on your website to visit your website, and (2) allow your website to appear in search results.
Can you publish Google Sites? On a computer, open a new Google Sites site. At the top, click Publish. Enter your website address.