站点地图 URL 提取
从 sitemap.xml 或 robots.txt 提取并导出
需要生成 llms.txt 草稿(来自 URL 列表)?请使用 llms.txt 构建工具.
结果 (0)
尚无 URL — 请粘贴 XML 并点击「解析」,或加载示例。
Pull URLs out of sitemap.xml or robots.txt fast
Sitemaps are the source of truth for what a site wants indexed. Extracting their URLs gives you a clean list for SEO audits, archive snapshots, llms.txt builders, AI training datasets, or migration checks — without writing a parser yourself or paying for a desktop tool to do it.
Use the extractor when you need to
Audit a competitor's site structure
Pull every URL from their sitemap to map out content categories and depth in minutes.
Build a list for an llms.txt file
Extract URLs and feed them into the llms.txt builder to publish an LLM-friendly content map.
Migrate or archive a site
Pull all URLs before a redesign so you can set up redirects or capture an archive of the old structure.
How to extract sitemap URLs
- 1
Paste sitemap.xml or robots.txt content, or fetch a public URL when CORS allows.
- 2
Click Extract to list every URL with its lastmod, priority, and changefreq if present.
- 3
Filter or sort the list, then export as JSON, CSV, or plain text.
Keep going
Turn URLs into llms.txt
Feed the extracted URL list into a generator that builds a clean llms.txt for AI crawlers.
Encode URL components
Encode special characters before using URLs in queries or scripts.
Test URL endpoints
Send requests to extracted URLs to verify status, redirects, or content type.
Format the JSON export
Beautify the exported JSON for inclusion in docs or downstream pipelines.
Common extraction workflows
Pull every indexed URL and look for thin pages, duplicates, or missing content categories.
Extract sitemap URLs once and use them as the foundation of your llms.txt content list.
Capture the full URL inventory before changing CMS or restructuring sections.
相关工具
常见问题
大多数网站不会发送允许其他站点读取其 sitemap.xml 的 CORS headers,这是正常现象。能用 Fetch 的时候就用(例如同站点或开放 CORS)。否则可以在新标签页打开 sitemap,复制 XML 粘贴到这里,或直接上传文件——这些方式一定可用。
sitemap index 是一种 XML 文件,它列出其他 sitemap 文件,而不是直接列出所有页面 URL。如果解析后看到的主要是 .xml 链接,请继续抓取或粘贴每个子 sitemap 的 XML,才能收集到真实的页面 URL。
可以。粘贴 robots.txt 内容后,工具会提取其中的 Sitemap: 行并列出对应 URL。若 CORS 允许,你也可以对每个 sitemap URL 尝试 Fetch。
不会。解析和导出都在浏览器内完成,不会发送到 JSONTech 服务器。
使用 llms.txt Builder 工具:复制提取出的 URL 列表,粘贴到那里,填写标题和描述,然后下载 llms.txt。