tutorial

How to Extract Metadata from Any URL

What is URL Metadata?

Every webpage contains hidden metadata that describes its content. This metadata includes the page title, description, Open Graph tags for social media previews, Twitter Card information, favicon URLs, structured data (JSON-LD), and more.

When you paste a link into Slack, Discord, or any messaging app, the platform fetches this metadata to build the rich preview card you see. Building that same capability into your own application requires parsing HTML, handling edge cases, and dealing with redirects — or you can use a dedicated metadata extraction API.

Why Extract Metadata Programmatically?

There are many practical use cases for URL metadata extraction:

  • Link previews — Show rich cards when users paste URLs in your chat app, CMS, or social platform
  • SEO auditing — Check if your pages have correct Open Graph tags, Twitter Cards, and structured data
  • Content aggregation — Build news readers, bookmark managers, or content curation tools
  • Bot detection — Verify that URLs are legitimate before displaying them to users
  • RAG pipelines — Extract clean page text for AI embeddings and retrieval-augmented generation
  • Competitive analysis — Monitor how competitors present their pages on social media

Extracting Metadata with LinkMeta

LinkMeta provides a free REST API that extracts 20+ metadata fields from any URL. No API key required, no signup needed.

Quick Start

Using cURL:

curl "https://linkmeta.dev/api/v1/extract?url=https://github.com"

Using JavaScript:

const response = await fetch('https://linkmeta.dev/api/v1/extract?url=https://github.com');
const data = await response.json();

console.log(data.data.title);       // "GitHub: Let's build from here"
console.log(data.data.description); // "GitHub is where over 100 million..."
console.log(data.data.image);       // OG image URL
console.log(data.data.favicon);     // Best quality favicon

Using Python:

import requests

response = requests.get('https://linkmeta.dev/api/v1/extract', params={
    'url': 'https://github.com'
})

data = response.json()['data']
print(f"Title: {data['title']}")
print(f"Description: {data['description']}")
print(f"Image: {data['image']}")

What Metadata Is Returned?

A single LinkMeta API call returns all of these fields:

Field Description
title Page title from <title> or og:title
description Meta description or og:description
image Best available preview image (OG, Twitter, or body)
imageSource Where the image was found (og, twitter, body)
favicon Highest quality favicon URL
favicons All discovered favicons with sizes and types
canonical Canonical URL of the page
language Detected language code (e.g., en, de)
author Page author if specified
published_time Publication date for articles
keywords Meta keywords array
og Full Open Graph tag object
twitter Full Twitter Card tag object
json_ld All JSON-LD structured data
theme_color Browser theme color
word_count Word count of page body
summary Auto-extracted text summary
body Clean extracted body text (opt-in)
redirectChain Full redirect chain from input to final URL
statusCode HTTP status code of the target
contentType MIME type of the response

Selecting Specific Fields

You don't always need every field. Use the fields parameter to request only what you need:

curl "https://linkmeta.dev/api/v1/extract?url=https://github.com&fields=title,description,image,favicon"

This reduces response size and processing time.

Extracting Clean Body Text for AI

For RAG (Retrieval-Augmented Generation) pipelines, you often need clean page text without HTML tags:

curl "https://linkmeta.dev/api/v1/extract?url=https://example.com&fields=body,summary&summary_length=500"

The body field returns up to 5000 characters of clean, structured text — perfect for creating embeddings or feeding into LLM context.

Batch Extraction

Need metadata from multiple URLs? Use the batch endpoint instead of making sequential requests:

curl -X POST "https://linkmeta.dev/api/v1/batch" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://github.com",
      "https://stackoverflow.com",
      "https://dev.to"
    ],
    "fields": "title,description,image"
  }'

The batch endpoint processes up to 10 URLs in parallel.

Handling Edge Cases

Real-world URL extraction involves many edge cases. Here's how LinkMeta handles them:

Redirects

Many URLs go through one or more redirects before reaching the final page. LinkMeta follows the entire redirect chain and returns it:

{
  "redirectChain": [
    { "url": "http://github.com", "statusCode": 301 },
    { "url": "https://github.com/", "statusCode": 200 }
  ]
}

JavaScript-Rendered Pages

Some single-page applications (SPAs) don't include meta tags in the initial HTML — they're injected by JavaScript. LinkMeta handles both server-rendered and client-rendered pages.

Invalid or Unreachable URLs

If a URL is unreachable, returns an error status, or has DNS failure, LinkMeta returns a clear error response:

{
  "status": "error",
  "error": {
    "code": "FETCH_FAILED",
    "message": "DNS resolution failed for the provided URL"
  }
}

Timeout Configuration

For slow-responding targets, increase the timeout:

curl "https://linkmeta.dev/api/v1/extract?url=https://slow-site.com&timeout=20000"

The default timeout is 10 seconds, configurable up to 30 seconds.

Building a Link Preview Component

Here's a complete example of building a link preview component using LinkMeta:

React Component

import { useState, useEffect } from 'react';

function LinkPreview({ url }) {
  const [meta, setMeta] = useState(null);
  const [loading, setLoading] = useState(true);

  useEffect(() => {
    fetch(`https://linkmeta.dev/api/v1/extract?url=${encodeURIComponent(url)}&fields=title,description,image,favicon`)
      .then(res => res.json())
      .then(data => {
        setMeta(data.data);
        setLoading(false);
      })
      .catch(() => setLoading(false));
  }, [url]);

  if (loading) return <div className="link-preview-skeleton" />;
  if (!meta) return <a href={url}>{url}</a>;

  return (
    <a href={url} className="link-preview" target="_blank" rel="noopener">
      {meta.image && <img src={meta.image} alt={meta.title} />}
      <div className="link-preview-content">
        <h3>{meta.title}</h3>
        <p>{meta.description}</p>
        <span className="link-preview-domain">
          {meta.favicon && <img src={meta.favicon} width="16" height="16" alt="" />}
          {new URL(url).hostname}
        </span>
      </div>
    </a>
  );
}

Validating Metadata Quality

Beyond extraction, LinkMeta also validates metadata quality with a score:

curl "https://linkmeta.dev/api/v1/validate?url=https://yoursite.com"

This returns a validation report with:

  • Score (0-100) and letter grade
  • Platform readiness for Facebook, Twitter, and LinkedIn
  • Specific issues like missing og:image or truncated titles
  • Recommendations for improvement

Why Choose LinkMeta?

Feature LinkMeta Alternatives
Price Free forever Often paid after trial
API key Not required Usually required
Fields extracted 20+ Typically 5-10
Batch support Up to 10 URLs Varies
Body text (RAG) Yes (opt-in) Rare
Favicon discovery All sizes/types Usually just one
Caching Built-in Often not included

LinkMeta is part of the SoftVoyagers ecosystem — a collection of free developer tools. Generate Open Graph images with OGForge, create QR codes with QRMint, or shorten URLs with LinkShrink.

Frequently Asked Questions

Is there a rate limit? Yes, but it's generous for free usage. The API handles typical development and production workloads without issues.

Does LinkMeta cache results? Yes, results are cached to improve response times. Use no_cache=true to force a fresh fetch when needed.

Can I use LinkMeta in production? Absolutely. LinkMeta runs on Azure with high availability and is designed for production use.

What about CORS? LinkMeta supports CORS, so you can call the API directly from browser JavaScript.


Start extracting metadata now — try the interactive playground or read the API documentation.

metadataurl-extractionapiopen-graphseo
Share this article: Twitter LinkedIn
← Back to Blog
Part of the SoftVoyagers Ecosystem