Ever wonder why a specific image or text is displayed when you create a Link Post? To understand how the image and text are selected, let’s first chat about scraping and how it works.
What is scraping?
All social media platforms use a behind-the-scenes process to gather images and text from hyperlinks and convert them into shareable “preview cards” or “content cards.” These content cards are made from three elements: an image, a title, and a description.
Crawling the webpage to collect these three elements is known as “scraping.” The scraping process reads a webpage, finds the various bits of content, and fashions it into a content card-type format. In the example below, Twitter scraped this link and found the image, title, and description to create this lovely content card:
Social media platforms modify the appearance of the content card to match their unique style, but the information they gather behind the scenes is often identical.
How does scraping work?
When creating content using a hyperlink on EveryoneSocial or any other social media website, the crawling service (often referred to as a “bot”) visits the webpage and reads the content.
Instead of “reading” the website for images and text like humans do, the bot scans the webpage’s data behind the scenes and looks for specific properties. These meta properties, sometimes called “tags,” tell the bot what content is suitable for use in the content card.
Meta properties work like tags that say, “Hey bot, use this image, this title, and this description to create a content card!” Without these properties in place, the bot will attempt to guess which images and content might be suitable for those fields—sometimes with mixed results.
How images and text are coded into the site
EveryoneSocial and all other social media platforms use Open Graph or “OG” tags to make image and text selections. As discussed above, Open Graph tags are just the fancy name for the labels that tell the bots what content should be used to create a content card.
Social media platforms require three OG tags to create content or preview cards. The three OG tags are the title, image, and description.
Lost on that last bit? That’s OK. The website administrator who created the page will have the required developer permissions to make the necessary changes and will be familiar with these concepts. We recommend sharing this article and our troubleshooting article with your web admins to ensure the page is social media-ready.
How EveryoneSocial finds images and text
If every web developer used the industry best practice of applying the OG tags mentioned above, not much else would need to be said here. But regrettably, that’s not the case. Some web developers design pages that look pleasing to the eye but lack the necessary tags that enable social media sharing.
So what happens then? We can’t answer for other social media platforms, but EveryoneSocial uses several layers of redundancy to achieve the best possible results — even when the webpage doesn’t follow convention standards.
First, we use three different scraping bot services.
Rather than rely upon one scraper service, we’ve built redundancy to scrape as many web page varieties as possible. We scrape webpages in this sequence, starting with the first and passing to the next if errors are encountered.
Second, we fall back to HTML tags if the OG tags are missing.
OG tags tell crawler bots specifically what content to use. In contrast, HTML tags work like broad categories; they tell bots what type of content to expect but don’t specify what content to select.
If the OG tags are missing, the three HTML tags we fallback on are, you guessed it:
Here’s a bit more detail on how it all fits together:
- First, we scrape the page using our internally developed crawler, EveryoneSocialBot.
EveryoneSocialBot looks for Open Graph tags. If the three necessary tags are found, a content card appears.
If one or more Open Graph tags are missing, we fall back to other methods:
og:titlemissing = EveryoneSocialBot scrapes for the HTML tag
og:imagemissing = EveryoneSocialBot scrapes for the HTML tag
<image>(if present) and loads the first five images from the article, selecting the largest one. Images smaller than 300x200 are rejected.
og:descriptionmissing = EveryoneSocialBot scrapes for HTML tag
<meta property='description content=... />(If present). If not found, we fall back to the longest single text block and truncate it to 500 characters.
If one or all of these efforts fail, the scrape attempt is handed to the next provider, Embed.ly, which repeats a similar action using different mechanisms and fallbacks.
If Embed.ly fails, the scrape attempt is handed to the next provider, Iframely, which repeats a similar effort using slightly different mechanisms and fallbacks.
If all efforts fail, a notification appears in the app:
So far, we’ve talked about how everything works when things go according to plan, but if you’ve made it this far, you’ve likely found a link that’s giving you a bit of trouble. If so, we’re here to help! Start your next troubleshooting step here.
Please sign in to leave a comment.