Perforce Chronicle 2012.2/486814
API Documentation

P4Cms_Filter_DocxToText Class Reference

Filter to convert a Microsoft Word 2007 document to text. More...

List of all members.

Public Member Functions

 filter ($docx)
 Extract text contents from a Word format.

Detailed Description

Filter to convert a Microsoft Word 2007 document to text.

This implementation uses Zend_Search_Lucene_Docuemtn_Docx to extract text contents from a word document (supports Word 2007 format only.)

Copyright:
2011-2012 Perforce Software. All rights reserved
License:
Please see LICENSE.txt in top-level folder of this distribution.
Version:
2012.2/486814

Member Function Documentation

P4Cms_Filter_DocxToText::filter ( docx)

Extract text contents from a Word format.

Parameters:
string$docxthe Docx to be filtered.
Returns:
string the plain text output.
Exceptions:
Zend_Search_Lucene_Document_Exception
    {
        // shortcut if we have an empty string
        if (!strlen($docx)) {
            return;
        }

        // write contents to a tmp file
        $tempFile = tempnam(sys_get_temp_dir(), 'word');
        file_put_contents($tempFile, $docx);

        $document = Zend_Search_Lucene_Document_Docx::loadDocxFile($tempFile);

        // remove the temp file
        unlink($tempFile);

        return $document->getFieldValue('body');
    }

The documentation for this class was generated from the following file: