How to Extract Metadata from Any URL
What is URL Metadata?
Every webpage contains hidden metadata that describes its content. This metadata includes the page title, description, Open Graph tags for social media previews, Twitter Card information, favicon URLs, structured data (JSON-LD), and more.
When you paste a link into Slack, Discord, or any messaging app, the platform fetches this metadata to build the rich preview card you see. Building that same capability into your own application requires parsing HTML, handling edge cases, and dealing with redirects — or you can use a dedicated metadata extraction API.
Why Extract Metadata Programmatically?
There are many practical use cases for URL metadata extraction:
- Link previews — Show rich cards when users paste URLs in your chat app, CMS, or social platform
- SEO auditing — Check if your pages have correct Open Graph tags, Twitter Cards, and structured data
- Content aggregation — Build news readers, bookmark managers, or content curation tools
- Bot detection — Verify that URLs are legitimate before displaying them to users
- RAG pipelines — Extract clean page text for AI embeddings and retrieval-augmented generation
- Competitive analysis — Monitor how competitors present their pages on social media
Extracting Metadata with LinkMeta
LinkMeta provides a free REST API that extracts 20+ metadata fields from any URL. No API key required, no signup needed.
Quick Start
Using cURL:
curl "https://linkmeta.dev/api/v1/extract?url=https://github.com"
Using JavaScript:
const response = await fetch('https://linkmeta.dev/api/v1/extract?url=https://github.com');
const data = await response.json();
console.log(data.data.title); // "GitHub: Let's build from here"
console.log(data.data.description); // "GitHub is where over 100 million..."
console.log(data.data.image); // OG image URL
console.log(data.data.favicon); // Best quality favicon
Using Python:
import requests
response = requests.get('https://linkmeta.dev/api/v1/extract', params={
'url': 'https://github.com'
})
data = response.json()['data']
print(f"Title: {data['title']}")
print(f"Description: {data['description']}")
print(f"Image: {data['image']}")
What Metadata Is Returned?
A single LinkMeta API call returns all of these fields:
| Field | Description |
|---|---|
title |
Page title from <title> or og:title |
description |
Meta description or og:description |
image |
Best available preview image (OG, Twitter, or body) |
imageSource |
Where the image was found (og, twitter, body) |
favicon |
Highest quality favicon URL |
favicons |
All discovered favicons with sizes and types |
canonical |
Canonical URL of the page |
language |
Detected language code (e.g., en, de) |
author |
Page author if specified |
published_time |
Publication date for articles |
keywords |
Meta keywords array |
og |
Full Open Graph tag object |
twitter |
Full Twitter Card tag object |
json_ld |
All JSON-LD structured data |
theme_color |
Browser theme color |
word_count |
Word count of page body |
summary |
Auto-extracted text summary |
body |
Clean extracted body text (opt-in) |
redirectChain |
Full redirect chain from input to final URL |
statusCode |
HTTP status code of the target |
contentType |
MIME type of the response |
Selecting Specific Fields
You don't always need every field. Use the fields parameter to request only what you need:
curl "https://linkmeta.dev/api/v1/extract?url=https://github.com&fields=title,description,image,favicon"
This reduces response size and processing time.
Extracting Clean Body Text for AI
For RAG (Retrieval-Augmented Generation) pipelines, you often need clean page text without HTML tags:
curl "https://linkmeta.dev/api/v1/extract?url=https://example.com&fields=body,summary&summary_length=500"
The body field returns up to 5000 characters of clean, structured text — perfect for creating embeddings or feeding into LLM context.
Batch Extraction
Need metadata from multiple URLs? Use the batch endpoint instead of making sequential requests:
curl -X POST "https://linkmeta.dev/api/v1/batch" \
-H "Content-Type: application/json" \
-d '{
"urls": [
"https://github.com",
"https://stackoverflow.com",
"https://dev.to"
],
"fields": "title,description,image"
}'
The batch endpoint processes up to 10 URLs in parallel.
Handling Edge Cases
Real-world URL extraction involves many edge cases. Here's how LinkMeta handles them:
Redirects
Many URLs go through one or more redirects before reaching the final page. LinkMeta follows the entire redirect chain and returns it:
{
"redirectChain": [
{ "url": "http://github.com", "statusCode": 301 },
{ "url": "https://github.com/", "statusCode": 200 }
]
}
JavaScript-Rendered Pages
Some single-page applications (SPAs) don't include meta tags in the initial HTML — they're injected by JavaScript. LinkMeta handles both server-rendered and client-rendered pages.
Invalid or Unreachable URLs
If a URL is unreachable, returns an error status, or has DNS failure, LinkMeta returns a clear error response:
{
"status": "error",
"error": {
"code": "FETCH_FAILED",
"message": "DNS resolution failed for the provided URL"
}
}
Timeout Configuration
For slow-responding targets, increase the timeout:
curl "https://linkmeta.dev/api/v1/extract?url=https://slow-site.com&timeout=20000"
The default timeout is 10 seconds, configurable up to 30 seconds.
Building a Link Preview Component
Here's a complete example of building a link preview component using LinkMeta:
React Component
import { useState, useEffect } from 'react';
function LinkPreview({ url }) {
const [meta, setMeta] = useState(null);
const [loading, setLoading] = useState(true);
useEffect(() => {
fetch(`https://linkmeta.dev/api/v1/extract?url=${encodeURIComponent(url)}&fields=title,description,image,favicon`)
.then(res => res.json())
.then(data => {
setMeta(data.data);
setLoading(false);
})
.catch(() => setLoading(false));
}, [url]);
if (loading) return <div className="link-preview-skeleton" />;
if (!meta) return <a href={url}>{url}</a>;
return (
<a href={url} className="link-preview" target="_blank" rel="noopener">
{meta.image && <img src={meta.image} alt={meta.title} />}
<div className="link-preview-content">
<h3>{meta.title}</h3>
<p>{meta.description}</p>
<span className="link-preview-domain">
{meta.favicon && <img src={meta.favicon} width="16" height="16" alt="" />}
{new URL(url).hostname}
</span>
</div>
</a>
);
}
Validating Metadata Quality
Beyond extraction, LinkMeta also validates metadata quality with a score:
curl "https://linkmeta.dev/api/v1/validate?url=https://yoursite.com"
This returns a validation report with:
- Score (0-100) and letter grade
- Platform readiness for Facebook, Twitter, and LinkedIn
- Specific issues like missing
og:imageor truncated titles - Recommendations for improvement
Why Choose LinkMeta?
| Feature | LinkMeta | Alternatives |
|---|---|---|
| Price | Free forever | Often paid after trial |
| API key | Not required | Usually required |
| Fields extracted | 20+ | Typically 5-10 |
| Batch support | Up to 10 URLs | Varies |
| Body text (RAG) | Yes (opt-in) | Rare |
| Favicon discovery | All sizes/types | Usually just one |
| Caching | Built-in | Often not included |
LinkMeta is part of the SoftVoyagers ecosystem — a collection of free developer tools. Generate Open Graph images with OGForge, create QR codes with QRMint, or shorten URLs with LinkShrink.
Frequently Asked Questions
Is there a rate limit? Yes, but it's generous for free usage. The API handles typical development and production workloads without issues.
Does LinkMeta cache results?
Yes, results are cached to improve response times. Use no_cache=true to force a fresh fetch when needed.
Can I use LinkMeta in production? Absolutely. LinkMeta runs on Azure with high availability and is designed for production use.
What about CORS? LinkMeta supports CORS, so you can call the API directly from browser JavaScript.
Start extracting metadata now — try the interactive playground or read the API documentation.