Requirements

PHPWord and SimpleHTMLDom are included in the download from here - although you may want to get the latest versions of these. Please also see the requirements for PHPWord.

Setting up

Extract the zip archive;

Put the files onto your web server;

See also PHPWord installation and configuration.

More detailed setup details: https://htmltodocx.codeplex.com/wikipage?title=HTML%20to%20docx%20Converter%20Set%20Up%20Details&version=1

Overview

To get an idea of how this works, look at the file "example.php". Browsing to this page will cause a Word document to be downloaded - this document is the result of converting the file example_html.html file (in the example_files directory). For more detailed information, browse to documentation/index.php.

The converter supports most of the typical HTML elements - see function h2d_html_allowed_children() in h2d_htmlconverter.php for the full list. A "style sheet" can be created - and some initial styles can also be set - see function h2d_styles() in styles.inc. Note you can define some default styles (initially set - and applying to all elements initially), you can define further styles according to tags, classes, and inline styles. The attribute-value pairs associated with each of these are the attribute-value pairs used by PHPWord for any of its styles (e.g. text styles, paragraph styles).

Example

 

<?php
/**
*  Example of use of HTML to docx converter
*/

// Load the files we need:
require_once 'phpword/PHPWord.php';
require_once 'simplehtmldom/simple_html_dom.php';
require_once 'htmltodocx_converter/h2d_htmlconverter.php';
require_once 'example_files/styles.inc';

// HTML fragment we want to parse:
$html = file_get_contents('example_files/example_html.html');
 
// New Word Document:
$phpword_object = new PHPWord();
$section = $phpword_object->createSection();

// HTML Dom object:
$html_dom = new simple_html_dom();
$html_dom->load('<html><body>' . $html . '</body></html>');
// Note, we needed to nest the html in a couple of dummy elements.

// Create the dom array of elements which we are going to work on:
$html_dom_array = $html_dom->find('html',0)->children();

// Provide some initial settings:
$initial_state = array(
  // Required parameters:
  'phpword_object' => &$phpword_object, // Must be passed by reference.
  'base_root' => 'http://test.local', // Required for link elements - change it to your domain.
  'base_path' => '/htmltodocx/', // Path from base_root to whatever url your links are relative to.
  
  // Optional parameters - showing the defaults if you don't set anything:
  'current_style' => array('size' => '11'), // The PHPWord style on the top element - may be inherited by descendent elements.
  'parents' => array(0 => 'body'), // Our parent is body.
  'list_depth' => 0, // This is the current depth of any current list.
  'context' => 'section', // Possible values - section, footer or header.
  'pseudo_list' => TRUE, // NOTE: Word lists not yet supported (TRUE is the only option at present).
  'pseudo_list_indicator_font_name' => 'Wingdings', // Bullet indicator font.
  'pseudo_list_indicator_font_size' => '7', // Bullet indicator size.
  'pseudo_list_indicator_character' => 'l ', // Gives a circle bullet point with wingdings.
  'table_allowed' => TRUE, // Note, if you are adding this html into a PHPWord table you should set this to FALSE: tables cannot be nested in PHPWord.
  'treat_div_as_paragraph' => TRUE, // If set to TRUE, each new div will trigger a new line in the Word document.
      
  // Optional - no default:    
  'style_sheet' => htmltodocx_styles_example(), // This is an array (the "style sheet") - returned by htmltodocx_styles_example() here (in styles.inc) - see this function for an example of how to construct this array.
  );    

// Convert the HTML and put it into the PHPWord object
htmltodocx_insert_html($section, $html_dom_array[0]->nodes, $initial_state);

// Clear the HTML dom object:
$html_dom->clear(); 
unset($html_dom);

// Save File
$h2d_file_uri = tempnam('', 'htd');
$objWriter = PHPWord_IOFactory::createWriter($phpword_object, 'Word2007');
$objWriter->save($h2d_file_uri);

// Download the file:
header('Content-Description: File Transfer');
header('Content-Type: application/octet-stream');
header('Content-Disposition: attachment; filename=example.docx');
header('Content-Transfer-Encoding: binary');
header('Expires: 0');
header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
header('Pragma: public');
header('Content-Length: ' . filesize($h2d_file_uri));
ob_clean();
flush();
$status = readfile($h2d_file_uri);
unlink($h2d_file_uri);
exit;

 

Last edited Sep 30, 2013 at 7:14 PM by neilt17, version 15

Comments

No comments yet.