phpoffice - How do I extract the text content from a word document with PHP?

Solution:
Try to create your reader before
$source = "word.doc";
// create your reader object
$phpWordReader = \PhpOffice\PhpWord\IOFactory::createReader('MsDoc');
// read source
if($phpWordReader->canRead($source)) {
$phpWord = $phpWordReader->load($source);
... // rest of your code
}
Answer is based on this example and API documentation
Answer
Solution:
Rather than check each class for text, you can use
$sections = $phpWord->getSections();
foreach ($sections as $s) {
$els = $s->getElements();
/** @var ElementTest $e */
foreach ($els as $e) {
$class = get_class($e);
if (method_exists($class, 'getText')) {
$text .= $e->getText();
} else {
$text .= "\n";
}
}
}
Answer
Solution:
You can extract txt from a word document using catdoc http://www.wagner.pp.ru/~vitus/software/catdoc/
It can be installed on Ubuntu using
sudo apt-get install catdoc
Once you have catdoc working on your system you can call it from php using shell_exec()
<?php
$text = shell_exec('/(fullpath)/catdoc /(fullpath)/word.doc');
print $text;
?>
Be sure to substitute (fullpath) with the actual path to catdoc and your word doc.
EDIT ---- Addition
If you can save your files as .docx rather than .doc it is a little bit easier. You can use unzip rather than catdoc.
Simply replace:
$text = shell_exec('/(fullpath)/catdoc /(fullpath)/word.doc');
with
$text = shell_exec("/(fullpath)/unzip -p /(fullpath)/word.docx word/document.xml | sed -e 's/<[^>]\{1,\}>//g; s/[^[:print:]]\{1,\}//g'");
You could use this same technique with most other command line document to text converters. Just replace the command in the shell_exec() with the command that works on your system. You can check How to extract just plain text from .doc & .docx files? (unix) for other unix/linux alternatives
For other PHP alternatives check out How to extract text from word file .doc,docx,.xlsx,.pptx php
Share solution ↓
Additional Information:
Link To Answer People are also looking for solutions of the problem: illuminate\http\exceptions\posttoolargeexception
Didn't find the answer?
Our community is visited by hundreds of web development professionals every day. Ask your question and get a quick answer for free.
Similar questions
Find the answer in similar questions on our website.
Write quick answer
Do you know the answer to this question? Write a quick response to it. With your help, we will make our community stronger.