tutorial

How to Extract Metadata from Any URL

By SoftVoyagers March 23, 2026 6 min read

What is URL Metadata?

Every webpage contains hidden metadata that describes its content. This metadata includes the page title, description, Open Graph tags for social media previews, Twitter Card information, favicon URLs, structured data (JSON-LD), and more.

When you paste a link into Slack, Discord, or any messaging app, the platform fetches this metadata to build the rich preview card you see. Building that same capability into your own application requires parsing HTML, handling edge cases, and dealing with redirects — or you can use a dedicated metadata extraction API.

Why Extract Metadata Programmatically?

There are many practical use cases for URL metadata extraction:

Link previews — Show rich cards when users paste URLs in your chat app, CMS, or social platform
SEO auditing — Check if your pages have correct Open Graph tags, Twitter Cards, and structured data
Content aggregation — Build news readers, bookmark managers, or content curation tools
Bot detection — Verify that URLs are legitimate before displaying them to users
RAG pipelines — Extract clean page text for AI embeddings and retrieval-augmented generation
Competitive analysis — Monitor how competitors present their pages on social media

Extracting Metadata with LinkMeta

LinkMeta provides a free REST API that extracts 20+ metadata fields from any URL. No API key required, no signup needed.

Quick Start

Using cURL:

curl "https://linkmeta.dev/api/v1/extract?url=https://github.com"

Using JavaScript:

const response = await fetch('https://linkmeta.dev/api/v1/extract?url=https://github.com');
const data = await response.json();

console.log(data.data.title);       // "GitHub: Let's build from here"
console.log(data.data.description); // "GitHub is where over 100 million..."
console.log(data.data.image);       // OG image URL
console.log(data.data.favicon);     // Best quality favicon

Using Python:

import requests

response = requests.get('https://linkmeta.dev/api/v1/extract', params={
    'url': 'https://github.com'
})

data = response.json()['data']
print(f"Title: {data['title']}")
print(f"Description: {data['description']}")
print(f"Image: {data['image']}")

What Metadata Is Returned?

A single LinkMeta API call returns all of these fields:

Field	Description
`title`	Page title from `<title>` or `og:title`
`description`	Meta description or `og:description`
`image`	Best available preview image (OG, Twitter, or body)
`imageSource`	Where the image was found (`og`, `twitter`, `body`)
`favicon`	Highest quality favicon URL
`favicons`	All discovered favicons with sizes and types
`canonical`	Canonical URL of the page
`language`	Detected language code (e.g., `en`, `de`)
`author`	Page author if specified
`published_time`	Publication date for articles
`keywords`	Meta keywords array
`og`	Full Open Graph tag object
`twitter`	Full Twitter Card tag object
`json_ld`	All JSON-LD structured data
`theme_color`	Browser theme color
`word_count`	Word count of page body
`summary`	Auto-extracted text summary
`body`	Clean extracted body text (opt-in)
`redirectChain`	Full redirect chain from input to final URL
`statusCode`	HTTP status code of the target
`contentType`	MIME type of the response

Selecting Specific Fields

You don't always need every field. Use the fields parameter to request only what you need:

curl "https://linkmeta.dev/api/v1/extract?url=https://github.com&fields=title,description,image,favicon"

This reduces response size and processing time.

Extracting Clean Body Text for AI

For RAG (Retrieval-Augmented Generation) pipelines, you often need clean page text without HTML tags:

curl "https://linkmeta.dev/api/v1/extract?url=https://example.com&fields=body,summary&summary_length=500"

The body field returns up to 5000 characters of clean, structured text — perfect for creating embeddings or feeding into LLM context.

Batch Extraction

Need metadata from multiple URLs? Use the batch endpoint instead of making sequential requests:

curl -X POST "https://linkmeta.dev/api/v1/batch" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://github.com",
      "https://stackoverflow.com",
      "https://dev.to"
    ],
    "fields": "title,description,image"
  }'

The batch endpoint processes up to 10 URLs in parallel.

Handling Edge Cases

Real-world URL extraction involves many edge cases. Here's how LinkMeta handles them:

Redirects

Many URLs go through one or more redirects before reaching the final page. LinkMeta follows the entire redirect chain and returns it:

{
  "redirectChain": [
    { "url": "http://github.com", "statusCode": 301 },
    { "url": "https://github.com/", "statusCode": 200 }
  ]
}

JavaScript-Rendered Pages

Some single-page applications (SPAs) don't include meta tags in the initial HTML — they're injected by JavaScript. LinkMeta handles both server-rendered and client-rendered pages.

Invalid or Unreachable URLs

If a URL is unreachable, returns an error status, or has DNS failure, LinkMeta returns a clear error response:

{
  "status": "error",
  "error": {
    "code": "FETCH_FAILED",
    "message": "DNS resolution failed for the provided URL"
  }
}

Timeout Configuration

For slow-responding targets, increase the timeout:

curl "https://linkmeta.dev/api/v1/extract?url=https://slow-site.com&timeout=20000"

The default timeout is 10 seconds, configurable up to 30 seconds.

Building a Link Preview Component

Here's a complete example of building a link preview component using LinkMeta:

React Component

import { useState, useEffect } from 'react';

function LinkPreview({ url }) {
  const [meta, setMeta] = useState(null);
  const [loading, setLoading] = useState(true);

  useEffect(() => {
    fetch(`https://linkmeta.dev/api/v1/extract?url=${encodeURIComponent(url)}&fields=title,description,image,favicon`)
      .then(res => res.json())
      .then(data => {
        setMeta(data.data);
        setLoading(false);
      })
      .catch(() => setLoading(false));
  }, [url]);

  if (loading) return <div className="link-preview-skeleton" />;
  if (!meta) return <a href={url}>{url}</a>;

  return (
    <a href={url} className="link-preview" target="_blank" rel="noopener">
      {meta.image && <img src={meta.image} alt={meta.title} />}
      <div className="link-preview-content">
        <h3>{meta.title}</h3>
        <p>{meta.description}</p>
        <span className="link-preview-domain">
          {meta.favicon && <img src={meta.favicon} width="16" height="16" alt="" />}
          {new URL(url).hostname}
        </span>
      </div>
    </a>
  );
}

Validating Metadata Quality

Beyond extraction, LinkMeta also validates metadata quality with a score:

curl "https://linkmeta.dev/api/v1/validate?url=https://yoursite.com"

This returns a validation report with:

Score (0-100) and letter grade
Platform readiness for Facebook, Twitter, and LinkedIn
Specific issues like missing og:image or truncated titles
Recommendations for improvement

Why Choose LinkMeta?

Feature	LinkMeta	Alternatives
Price	Free forever	Often paid after trial
API key	Not required	Usually required
Fields extracted	20+	Typically 5-10
Batch support	Up to 10 URLs	Varies
Body text (RAG)	Yes (opt-in)	Rare
Favicon discovery	All sizes/types	Usually just one
Caching	Built-in	Often not included

LinkMeta is part of the SoftVoyagers ecosystem — a collection of free developer tools. Generate Open Graph images with OGForge, create QR codes with QRMint, or shorten URLs with LinkShrink.

Frequently Asked Questions

Is there a rate limit? Yes, but it's generous for free usage. The API handles typical development and production workloads without issues.

Does LinkMeta cache results? Yes, results are cached to improve response times. Use no_cache=true to force a fresh fetch when needed.

Can I use LinkMeta in production? Absolutely. LinkMeta runs on Azure with high availability and is designed for production use.

What about CORS? LinkMeta supports CORS, so you can call the API directly from browser JavaScript.

Start extracting metadata now — try the interactive playground or read the API documentation.

metadataurl-extractionapiopen-graphseo

← Back to Blog