Docx2txt is a Perl based command-line utility to convert (even corrupted) Microsoft docx documents to reasonably formatted text files, along with appropriate character conversions. Apart from Perl it also requires a command line unzipping program like unzip/7z/pkzipc/wzunzip.
Features
- Consists of (core) Perl and (wrapper) Unix/Windows shell scripts and a configuration file, with provision for maintaining separate system-wide configuration file and individual user-level configuration files.
- Perl script also works with input/output redirection, and is useful in viewing docx file content directly with editors like vim, emacs, and file browsers like mc (midnight commander).
- Can recover text from damaged docx documents in many cases.
- Short line justifications, showing hyperlink and many character conversions (missing in MS text conversion).
- Handles (bullet, decimal, letter, roman) lists along with indentation.
- Installation via Makefiles and Windows batch file. On non-Windows systems scripts and configuration file can be installed in separate directories.
- Can conveniently be used to build a web based docx document conversion service.
License
GNU General Public License version 2.0 (GPLv2)Follow docx2txt
Other Useful Business Software
Powerful Website Security | Continuous Web Threat Platform
Reflectiz is a comprehensive web exposure management platform that helps organizations proactively identify, monitor, and mitigate security, privacy, and compliance risks across their online environments. Designed to address the growing complexity of modern websites, Reflectiz provides full visibility and control over first, third, and even fourth-party components, such as scripts, trackers, and open-source libraries that often evade traditional security tools.
Rate This Project
Login To Rate This Project
User Reviews
-
Docx2txt works perfectly.
-
Very useful project!
-
This is an excellent extractor of text from docx files. If you use CakeCMD or No-Frills Command Unzipper to unzip the docx files, it will even extract text from corrupt docx files. This works well in a CGI script providing a text extraction web service of even corrupt docx files. See my instance at saveofficedata.com.