utf-8

Mar 12, 2012 at 8:21 AM
Edited Mar 12, 2012 at 8:22 AM

Hello! great class! but it doesn't work with utf-8(cyrillic symbols) iconv also doesn't help... so if anyone could help me? for example i try to export <h1> привет </h1> :(

p.s. section.php file i commented in PHPWord. but nothing happens(

Mar 13, 2012 at 7:25 AM

finally, i've solved the problem - now it speaks Russian, except 1 symbol((( capital Р

Coordinator
Mar 15, 2012 at 10:24 PM

Brilliant. What was the solution?

Coordinator
Mar 16, 2012 at 7:53 PM

As you've probably found, this appears to be a problem with PHPWord (not this converter which uses PHPWord). I guess you have tried what other people have suggested and removed the use of utf8_encode() from the PHPWord code. I've just tried that myself - and then tried putting Russian through it - and also as you have found, it works for all Russian characters except capital P (Р) - but if you use this character then there is an error with the creation of the Word document and it won't open. 

Mar 18, 2012 at 8:31 AM

yes, you are right... 

I put $text=utf8_decode($text) in Text.php in PHPword...

and it shows Russian except capital Р and some more symbols (it's blank character) ( ) - it's here:)))

p.s. i tried iconv function maybe everywhere, but it shows nothing if to use it.... so i don't know what to do.. i made my work but using just headers ... everythong works, but it's sort of xml doc, or web doc... but i reached almost final result... but i want to implement you solution - so if i have time i will continue to try it... 

Mar 18, 2012 at 8:43 AM

i tried your last version....

if to try <h1>Russian</h1><h1>Hello</h1><h1>Родина</h1> (capital russian Р) in utf8-test.html 

so i get (can't open file example due to mistakes bad xml sign .... document (row 33 column 35) )......

Mar 18, 2012 at 9:16 AM
Edited Mar 18, 2012 at 9:22 AM

hmmm... i tried blank PHPword  (just made $givenText = $text; in Section.php file)

$PHPWord = new PHPWord();
// New portrait section

$section = $PHPWord->createSection();
// Add text elements

$section->addText('Родина World!');$section->addTextBreak(2); // Родина with capital russian Р

$section->addText('I am inline styled.', array('name'=>'Verdana', 'color'=>'006699'));$section->addTextBreak(2);

$objWriter = PHPWord_IOFactory::createWriter($PHPWord, 'Word2007');$objWriter->save('Text.docx');

everything's fine....

i see the problem maybe as P (russian) like <P> tag?

Coordinator
Mar 18, 2012 at 9:20 PM
Edited Mar 18, 2012 at 9:21 PM

Found the problem! Line 103 in h2d_htmlconverter.php in function h2d_clean_text(), changed:

$text = preg_replace('/\s+/', ' ', $text);

to:

$text = preg_replace('/\s+/u', ' ', $text);

This wasn't dealing with utf-8 strings correctly.

This is in the latest change set, and I can create a new release shortly.

It's still necessary to do something about the use of utf8_encode() functions in PHPWord in order to display utf-8 characters.

Mar 19, 2012 at 11:30 AM

great - i'll try tonight ...

did u try - to set $text=utf8_decode($text)? in Text.php? or that doesn't solve?

Mar 19, 2012 at 12:37 PM

so what can i say? THANKS a lot... just a u sign(((( but now everything works - as i use utf-8 almost everywhere... so i comment everywhere utf8_encode in Section.php and paste $text=utf8_decode($text) in Text.php file in PHPWord... so my task is coming to be solved completely - i have to "link" css styles to my text..., to make TOC (table of content), and page numbers. Yeah = just forgotten... i couldn't make table border=1... in you examples it's also without borders as i see...

so thanks a lot again! 

Mar 19, 2012 at 12:45 PM
Edited Mar 19, 2012 at 12:46 PM

just short advice and i go further myself... how to implement my css file?

if to implement it to example.html as <style type="text/css"> ..... </style> so i get just a text...

am i right - i have to make an array in styles.inc file?

f.e. i have 

.f4 {

text-align:justify;

text-indent:35px;

line-height:120%;

margin: 0px 10px 2 px 20 px

}

it'll be as

$styles ['f4']=array

(

)... ??? seems to be rather tough)))

Coordinator
Mar 19, 2012 at 1:26 PM

Yes that's right - this array is a set of style definitions as recognised by PHPWord - and this is well documented here: http://phpword.codeplex.com/releases/view/49543#DownloadId=138036 (available from this page: http://phpword.codeplex.com/releases/view/49543).

Writing something to interpret a standard CSS file would be a very big job to do - and I don't think a standard CSS file would be appropriate for styling what will come out in the Word document - I think designers should consciously think about how they want their content to appear in a Word document by creating specific styles within what is possible with PHPWord.

I do plan to improve how inline styles work, particularly where you have numerical values (e.g. font size) - however I'm afraid I don't have time to do this at the moment.

Mar 19, 2012 at 2:43 PM
Edited Mar 19, 2012 at 2:46 PM

ok, thanx. I'll nave some free time in 1-2 weeks to think about it...

the problem is that i get from mysql bd somthing like

<table class="f4"> ... <p class="f4_text"> and so on... probably i can clean it using preg_replace or something like that... and then use styles... so i will try both ways.

p.s.As i said i realized the part of my project (export to word/excel) as web doc, using xml and something else - using headers... so i'll try to find maybe easier way))) so thanks again.

p.p.s. what about table border (=1)?)

Coordinator
Mar 19, 2012 at 9:25 PM

Yep, I noticed I hadn't added in the code for styling tables or table cells. I've added this in - it's in the latest commit - see http://htmltodocx.codeplex.com/SourceControl/list/changesets (change set 9791). Note, I haven't fully tested this (but it looks alright from what I have tried so far). Also, table styling in PHPWord appears to apply to all cells within the table (e.g. setting the border applies to all the cells not just the border of the table) - I haven't had a chance to verify that with a clean version of PHPWord though.