You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"## web_scrape_html\n\n`client.web.webScrapeHTML(url: string, maxAgeMs?: number, parsePDF?: boolean): { html: string; success: true; url: string; }`\n\n**get** `/web/scrape/html`\n\nScrapes the given URL and returns the raw HTML content of the page.\n\n### Parameters\n\n- `url: string`\n Full URL to scrape (must include http:// or https:// protocol)\n\n- `maxAgeMs?: number`\n Return a cached result if a prior scrape for the same parameters exists and is younger than this many milliseconds. Defaults to 1 day (86400000 ms) when omitted. Max is 30 days (2592000000 ms). Set to 0 to always scrape fresh.\n\n- `parsePDF?: boolean`\n When true (default), PDF URLs are fetched and their text layer is extracted and returned wrapped in <html><pdf>…</pdf></html>. When false, PDF URLs are skipped and a 400 WEBSITE_ACCESS_ERROR is returned.\n\n### Returns\n\n- `{ html: string; success: true; url: string; }`\n\n - `html: string`\n - `success: true`\n - `url: string`\n\n### Example\n\n```typescript\nimport ContextDev from 'context.dev';\n\nconst client = new ContextDev();\n\nconst response = await client.web.webScrapeHTML({ url: 'https://example.com' });\n\nconsole.log(response);\n```",
64
+
"## web_scrape_html\n\n`client.web.webScrapeHTML(url: string, includeFrames?: boolean, maxAgeMs?: number, parsePDF?: boolean): { html: string; success: true; url: string; }`\n\n**get** `/web/scrape/html`\n\nScrapes the given URL and returns the raw HTML content of the page.\n\n### Parameters\n\n- `url: string`\n Full URL to scrape (must include http:// or https:// protocol)\n\n- `includeFrames?: boolean`\n When true, iframes are rendered inline into the returned HTML.\n\n- `maxAgeMs?: number`\n Return a cached result if a prior scrape for the same parameters exists and is younger than this many milliseconds. Defaults to 1 day (86400000 ms) when omitted. Max is 30 days (2592000000 ms). Set to 0 to always scrape fresh.\n\n- `parsePDF?: boolean`\n When true (default), PDF URLs are fetched and their text layer is extracted and returned wrapped in <html><pdf>…</pdf></html>. When false, PDF URLs are skipped and a 400 WEBSITE_ACCESS_ERROR is returned.\n\n### Returns\n\n- `{ html: string; success: true; url: string; }`\n\n - `html: string`\n - `success: true`\n - `url: string`\n\n### Example\n\n```typescript\nimport ContextDev from 'context.dev';\n\nconst client = new ContextDev();\n\nconst response = await client.web.webScrapeHTML({ url: 'https://example.com' });\n\nconsole.log(response);\n```",
"## web_scrape_md\n\n`client.web.webScrapeMd(url: string, includeImages?: boolean, includeLinks?: boolean, maxAgeMs?: number, parsePDF?: boolean, shortenBase64Images?: boolean, useMainContentOnly?: boolean): { markdown: string; success: true; url: string; }`\n\n**get** `/web/scrape/markdown`\n\nScrapes the given URL into LLM usable Markdown.\n\n### Parameters\n\n- `url: string`\n Full URL to scrape into LLM usable Markdown (must include http:// or https:// protocol)\n\n- `includeImages?: boolean`\n Include image references in Markdown output\n\n- `includeLinks?: boolean`\n Preserve hyperlinks in Markdown output\n\n- `maxAgeMs?: number`\n Return a cached result if a prior scrape for the same parameters exists and is younger than this many milliseconds. Defaults to 1 day (86400000 ms) when omitted. Max is 30 days (2592000000 ms). Set to 0 to always scrape fresh.\n\n- `parsePDF?: boolean`\n When true (default), PDF URLs are fetched and their text layer is extracted and converted to Markdown. When false, PDF URLs are skipped and a 400 WEBSITE_ACCESS_ERROR is returned.\n\n- `shortenBase64Images?: boolean`\n Shorten base64-encoded image data in the Markdown output\n\n- `useMainContentOnly?: boolean`\n Extract only the main content of the page, excluding headers, footers, sidebars, and navigation\n\n### Returns\n\n- `{ markdown: string; success: true; url: string; }`\n\n - `markdown: string`\n - `success: true`\n - `url: string`\n\n### Example\n\n```typescript\nimport ContextDev from 'context.dev';\n\nconst client = new ContextDev();\n\nconst response = await client.web.webScrapeMd({ url: 'https://example.com' });\n\nconsole.log(response);\n```",
107
+
"## web_scrape_md\n\n`client.web.webScrapeMd(url: string, includeFrames?: boolean, includeImages?: boolean, includeLinks?: boolean, maxAgeMs?: number, parsePDF?: boolean, shortenBase64Images?: boolean, useMainContentOnly?: boolean): { markdown: string; success: true; url: string; }`\n\n**get** `/web/scrape/markdown`\n\nScrapes the given URL into LLM usable Markdown.\n\n### Parameters\n\n- `url: string`\n Full URL to scrape into LLM usable Markdown (must include http:// or https:// protocol)\n\n- `includeFrames?: boolean`\n When true, the contents of iframes are rendered to Markdown.\n\n- `includeImages?: boolean`\n Include image references in Markdown output\n\n- `includeLinks?: boolean`\n Preserve hyperlinks in Markdown output\n\n- `maxAgeMs?: number`\n Return a cached result if a prior scrape for the same parameters exists and is younger than this many milliseconds. Defaults to 1 day (86400000 ms) when omitted. Max is 30 days (2592000000 ms). Set to 0 to always scrape fresh.\n\n- `parsePDF?: boolean`\n When true (default), PDF URLs are fetched and their text layer is extracted and converted to Markdown. When false, PDF URLs are skipped and a 400 WEBSITE_ACCESS_ERROR is returned.\n\n- `shortenBase64Images?: boolean`\n Shorten base64-encoded image data in the Markdown output\n\n- `useMainContentOnly?: boolean`\n Extract only the main content of the page, excluding headers, footers, sidebars, and navigation\n\n### Returns\n\n- `{ markdown: string; success: true; url: string; }`\n\n - `markdown: string`\n - `success: true`\n - `url: string`\n\n### Example\n\n```typescript\nimport ContextDev from 'context.dev';\n\nconst client = new ContextDev();\n\nconst response = await client.web.webScrapeMd({ url: 'https://example.com' });\n\nconsole.log(response);\n```",
"## web_crawl_md\n\n`client.web.webCrawlMd(url: string, followSubdomains?: boolean, includeImages?: boolean, includeLinks?: boolean, maxAgeMs?: number, maxDepth?: number, maxPages?: number, parsePDF?: boolean, shortenBase64Images?: boolean, urlRegex?: string, useMainContentOnly?: boolean): { metadata: object; results: object[]; }`\n\n**post** `/web/crawl`\n\nPerforms a crawl starting from a given URL, extracts page content as Markdown, and returns results for all crawled pages.\n\n### Parameters\n\n- `url: string`\n The starting URL for the crawl (must include http:// or https:// protocol)\n\n- `followSubdomains?: boolean`\n When true, follow links on subdomains of the starting URL's domain (e.g. docs.example.com when starting from example.com). www and apex are always treated as equivalent.\n\n- `includeImages?: boolean`\n Include image references in the Markdown output\n\n- `includeLinks?: boolean`\n Preserve hyperlinks in the Markdown output\n\n- `maxAgeMs?: number`\n Return a cached result if a prior scrape for the same parameters exists and is younger than this many milliseconds. Defaults to 1 day (86400000 ms) when omitted. Max is 30 days (2592000000 ms). Set to 0 to always scrape fresh.\n\n- `maxDepth?: number`\n Maximum link depth from the starting URL (0 = only the starting page)\n\n- `maxPages?: number`\n Maximum number of pages to crawl. Hard cap: 500.\n\n- `parsePDF?: boolean`\n When true (default), PDF pages are fetched and their text layer is extracted and converted to Markdown alongside HTML pages. When false, PDF pages are skipped entirely (not included in results and not counted as failures).\n\n- `shortenBase64Images?: boolean`\n Truncate base64-encoded image data in the Markdown output\n\n- `urlRegex?: string`\n Regex pattern. Only URLs matching this pattern will be followed and scraped.\n\n- `useMainContentOnly?: boolean`\n Extract only the main content, stripping headers, footers, sidebars, and navigation\n\n### Returns\n\n- `{ metadata: { maxCrawlDepth: number; numFailed: number; numSkipped: number; numSucceeded: number; numUrls: number; }; results: { markdown: string; metadata: { crawlDepth: number; statusCode: number; success: boolean; title: string; url: string; }; }[]; }`\n\n - `metadata: { maxCrawlDepth: number; numFailed: number; numSkipped: number; numSucceeded: number; numUrls: number; }`\n - `results: { markdown: string; metadata: { crawlDepth: number; statusCode: number; success: boolean; title: string; url: string; }; }[]`\n\n### Example\n\n```typescript\nimport ContextDev from 'context.dev';\n\nconst client = new ContextDev();\n\nconst response = await client.web.webCrawlMd({ url: 'https://example.com' });\n\nconsole.log(response);\n```",
268
+
"## web_crawl_md\n\n`client.web.webCrawlMd(url: string, followSubdomains?: boolean, includeFrames?: boolean, includeImages?: boolean, includeLinks?: boolean, maxAgeMs?: number, maxDepth?: number, maxPages?: number, parsePDF?: boolean, shortenBase64Images?: boolean, urlRegex?: string, useMainContentOnly?: boolean): { metadata: object; results: object[]; }`\n\n**post** `/web/crawl`\n\nPerforms a crawl starting from a given URL, extracts page content as Markdown, and returns results for all crawled pages.\n\n### Parameters\n\n- `url: string`\n The starting URL for the crawl (must include http:// or https:// protocol)\n\n- `followSubdomains?: boolean`\n When true, follow links on subdomains of the starting URL's domain (e.g. docs.example.com when starting from example.com). www and apex are always treated as equivalent.\n\n- `includeFrames?: boolean`\n When true, the contents of iframes are rendered to Markdown for each crawled page.\n\n- `includeImages?: boolean`\n Include image references in the Markdown output\n\n- `includeLinks?: boolean`\n Preserve hyperlinks in the Markdown output\n\n- `maxAgeMs?: number`\n Return a cached result if a prior scrape for the same parameters exists and is younger than this many milliseconds. Defaults to 1 day (86400000 ms) when omitted. Max is 30 days (2592000000 ms). Set to 0 to always scrape fresh.\n\n- `maxDepth?: number`\n Maximum link depth from the starting URL (0 = only the starting page)\n\n- `maxPages?: number`\n Maximum number of pages to crawl. Hard cap: 500.\n\n- `parsePDF?: boolean`\n When true (default), PDF pages are fetched and their text layer is extracted and converted to Markdown alongside HTML pages. When false, PDF pages are skipped entirely (not included in results and not counted as failures).\n\n- `shortenBase64Images?: boolean`\n Truncate base64-encoded image data in the Markdown output\n\n- `urlRegex?: string`\n Regex pattern. Only URLs matching this pattern will be followed and scraped.\n\n- `useMainContentOnly?: boolean`\n Extract only the main content, stripping headers, footers, sidebars, and navigation\n\n### Returns\n\n- `{ metadata: { maxCrawlDepth: number; numFailed: number; numSkipped: number; numSucceeded: number; numUrls: number; }; results: { markdown: string; metadata: { crawlDepth: number; statusCode: number; success: boolean; title: string; url: string; }; }[]; }`\n\n - `metadata: { maxCrawlDepth: number; numFailed: number; numSkipped: number; numSucceeded: number; numUrls: number; }`\n - `results: { markdown: string; metadata: { crawlDepth: number; statusCode: number; success: boolean; title: string; url: string; }; }[]`\n\n### Example\n\n```typescript\nimport ContextDev from 'context.dev';\n\nconst client = new ContextDev();\n\nconst response = await client.web.webCrawlMd({ url: 'https://example.com' });\n\nconsole.log(response);\n```",
0 commit comments