A lightweight and efficient text content extractor mainly for OOXML files (typically referring to docx/xlsx/pptx).
-
Updated
Dec 11, 2023 - Go
A lightweight and efficient text content extractor mainly for OOXML files (typically referring to docx/xlsx/pptx).
Java library. Detect top-level selector on the HTML page.
A powerful Playwright-based web scraper that extracts full website content—titles, headings, paragraphs, links, images, and HTML—with optional AI analysis support.
ContentExtractor delivers instant insights via intelligent pattern recognition and automated content analysis 🐙.
Add a description, image, and links to the content-extractor topic page so that developers can more easily learn about it.
To associate your repository with the content-extractor topic, visit your repo's landing page and select "manage topics."