Page 1 of 1

UTF-8 and other encoding improvements

Posted: Thu Jan 05, 2012 12:28 pm
by Hyvätti
Hi,

In order to get a translation and email piping and email sending working with UTF-8, I'm in a process of improving the encoding support of HESK 2.3. Getting all email headers working will take a little more effort, but anyway here's what I got so far. Outgoing Subject: line is fixed. Incoming From: address person name containing non-ASCII chars is still not done. The HTTP header changes allow UTF-8 charset to work in all PHP environments, servers and browsers.

My plan is to create also Finnish translations. 70% done. Will post later this spring.

The code below probably does not copy right, so get your patch from http://www.iki.fi/hyvatti/sw/hesk23-a.diff

Code: Select all

diff -ru /tmp/h/inc/email_functions.inc.php ./inc/email_functions.inc.php
--- /tmp/h/inc/email_functions.inc.php	2011-10-19 19:08:58.000000000 +0300
+++ ./inc/email_functions.inc.php	2012-01-05 08:42:37.320663615 +0200
@@ -48,6 +48,12 @@
 function hesk_mail($to,$subject,$message) {
 	global $hesk_settings, $hesklang;
 
+    $subject2 = quoted_printable_encode($subject);
+    if ($subject2 != $subject) {
+      $subject2 = str_replace (' ', '_', $subject2);
+      $subject = "=?" . $hesklang['ENCODING'] . "?Q?${subject2}?=";
+    }
+
     /* Use PHP's mail function */
 	if ( ! $hesk_settings['smtp'])
     {
diff -ru /tmp/h/inc/header.inc.php ./inc/header.inc.php
--- /tmp/h/inc/header.inc.php	2011-09-15 21:36:06.000000000 +0300
+++ ./inc/header.inc.php	2011-12-19 07:07:28.107662879 +0200
@@ -35,9 +35,10 @@
 /* Check if this is a valid include */
 if (!defined('IN_SCRIPT')) {die('Invalid attempt');}
 
+header('Content-Type: text/html; charset='.$hesklang['ENCODING']);
 ?>
 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
-<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="<?php echo $hesk_settings['languages'][$hesk_settings['language']]['folder'] ?>" lang="<?php echo $hesk_settings['languages'][$hesk_settings['language']]['folder'] ?>">
 <head>
 	<title><?php echo (isset($hesk_settings['tmp_title']) ? $hesk_settings['tmp_title'] : $hesk_settings['hesk_title']); ?></title>
 	<meta http-equiv="Content-Type" content="text/html;charset=<?php echo $hesklang['ENCODING']; ?>" />

diff -ru /tmp/h/print.php ./print.php
--- /tmp/h/print.php	2011-09-15 21:36:08.000000000 +0300
+++ ./print.php	2011-12-19 06:58:53.475662759 +0200
@@ -72,6 +72,7 @@
 $sql = "SELECT * FROM `".hesk_dbEscape($hesk_settings['db_pfix'])."replies` WHERE `replyto`='".hesk_dbEscape($ticket['id'])."' ORDER BY `id` ASC";
 $res  = hesk_dbQuery($sql);
 $replies = hesk_dbNumRows($res);
+header('Content-Type: text/html; charset='.$hesklang['ENCODING']);
 ?>
 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
 <html>
Hope this helps developers or someone. If you can figure out how person name is decoded correctly from incoming encoded email headers, I would appreciate it.

Regards,
Jaakko
Tel. +358 40 5011222

Re: UTF-8 and other encoding improvements

Posted: Fri Jan 06, 2012 3:04 pm
by Klemen
Thanks for sharing your work.

I plan to move HESK to UTF8 exclusively in the future for all languages (should have done this from the start, would save a lot of trouble). Not sure if version 2.4 will already have this, but definitely will be done.

Incoming messages are a very delicate thing as they can be in any encoding that you have to also encode in UTF8 (for example using mb_convert_encoding() ). The "mime_parser.php" class (in inc/mail) is a very powerful one, you may want to dig through that as it may help you.

Sorry I can't give any specific help at the moment, but when I get to tackle it surely it will be included in HESK.

Re: UTF-8 and other encoding improvements

Posted: Sat Jan 14, 2012 10:29 pm
by Hyvätti
Hi,

I finally fixed the last thing that did not support non-ASCII: the incoming email sender name. Now the names of the customers do not get mangled. Below is the diff. Also the Finnish translation is done, as you can see in the AddOns section. The diff below includes some additional date formats, most notably the ISO8601 format.

http://www.iki.fi/hyvatti/sw/hesk23-fi.diff

Code: Select all

diff -ru /tmp/h/inc/calendar/calendar_js.php ./inc/calendar/calendar_js.php
--- /tmp/h/inc/calendar/calendar_js.php	2011-09-15 21:36:06.000000000 +0300
+++ ./inc/calendar/calendar_js.php	2011-12-20 03:56:19.100948433 +0200
@@ -61,11 +61,23 @@
 function f_tcalParseDate (s_date) {
 
         var re_date = /^\s*(\d{1,2})\/(\d{1,2})\/(\d{2,4})\s*$/;
-        if (!re_date.exec(s_date))
+        var re_date1 = /^\s*(\d{1,2})\.\s*(\d{1,2})\.\s*(\d{2,4})\s*$/;
+        var re_date2 = /^\s*(\d{2,4})-(\d{1,2})-(\d{1,2})\s*$/;
+        var n_day, n_month, n_year;
+        if (re_date.exec(s_date)) {
+	  n_day = Number(RegExp.$2);
+	  n_month = Number(RegExp.$1);
+	  n_year = Number(RegExp.$3);
+	} else if (re_date1.exec(s_date)) {
+	  n_day = Number(RegExp.$1);
+	  n_month = Number(RegExp.$2);
+	  n_year = Number(RegExp.$3);
+	} else if (re_date2.exec(s_date)) {
+	  n_day = Number(RegExp.$3);
+	  n_month = Number(RegExp.$2);
+	  n_year = Number(RegExp.$1);
+	} else
                 return alert ("<?php echo $hesklang['cinv']; ?>: '" + s_date + "'.\n<?php echo $hesklang['cinv2']; ?>.")
-        var n_day = Number(RegExp.$2),
-                n_month = Number(RegExp.$1),
-                n_year = Number(RegExp.$3);
 
         if (n_year < 100)
                 n_year += (n_year < this.a_tpl.centyear ? 2000 : 1900);
diff -ru /tmp/h/inc/email_functions.inc.php ./inc/email_functions.inc.php
--- /tmp/h/inc/email_functions.inc.php	2011-10-19 19:08:58.000000000 +0300
+++ ./inc/email_functions.inc.php	2012-01-11 12:57:52.440440463 +0200
@@ -48,6 +48,12 @@
 function hesk_mail($to,$subject,$message) {
 	global $hesk_settings, $hesklang;
 
+    $subject2 = quoted_printable_encode($subject);
+    if ($subject2 != $subject) {
+      $subject2 = str_replace (' ', '_', $subject2);
+      $subject = "=?" . $hesklang['ENCODING'] . "?Q?${subject2}?=";
+    }
+
     /* Use PHP's mail function */
 	if ( ! $hesk_settings['smtp'])
     {
@@ -82,7 +88,7 @@
                 "Reply-To: $hesk_settings[noreply_mail]",
                 "Return-Path: $hesk_settings[webmaster_mail]",
 				"Subject: " . $subject,
-				"Date: ".strftime("%a, %d %b %Y %H:%M:%S %Z"),
+				"Date: ".strftime(DATE_RFC2822),
                 "Content-Type: text/plain; charset=".$hesklang['ENCODING']
 			), $message))
     {
diff -ru /tmp/h/inc/header.inc.php ./inc/header.inc.php
--- /tmp/h/inc/header.inc.php	2011-09-15 21:36:06.000000000 +0300
+++ ./inc/header.inc.php	2011-12-19 07:07:28.107662879 +0200
@@ -35,9 +35,10 @@
 /* Check if this is a valid include */
 if (!defined('IN_SCRIPT')) {die('Invalid attempt');}
 
+header('Content-Type: text/html; charset='.$hesklang['ENCODING']);
 ?>
 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
-<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="<?php echo $hesk_settings['languages'][$hesk_settings['language']]['folder'] ?>" lang="<?php echo $hesk_settings['languages'][$hesk_settings['language']]['folder'] ?>">
 <head>
 	<title><?php echo (isset($hesk_settings['tmp_title']) ? $hesk_settings['tmp_title'] : $hesk_settings['hesk_title']); ?></title>
 	<meta http-equiv="Content-Type" content="text/html;charset=<?php echo $hesklang['ENCODING']; ?>" />
diff -ru /tmp/h/print.php ./print.php
--- /tmp/h/print.php	2011-09-15 21:36:08.000000000 +0300
+++ ./print.php	2011-12-19 06:58:53.475662759 +0200
@@ -72,6 +72,7 @@
 $sql = "SELECT * FROM `".hesk_dbEscape($hesk_settings['db_pfix'])."replies` WHERE `replyto`='".hesk_dbEscape($ticket['id'])."' ORDER BY `id` ASC";
 $res  = hesk_dbQuery($sql);
 $replies = hesk_dbNumRows($res);
+header('Content-Type: text/html; charset='.$hesklang['ENCODING']);
 ?>
 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
 <html>
diff -ru /tmp/h/inc/mail/email_parser.php ./inc/mail/email_parser.php
--- /tmp/h/inc/mail/email_parser.php	2011-09-15 21:36:06.000000000 +0300
+++ ./inc/mail/email_parser.php	2012-01-15 00:07:27.809378260 +0200
@@ -171,14 +171,17 @@
   foreach($email_info as $info){
     $address = "";
     $name = "";
+    $encoding = "";
     if ( array_key_exists("address", $info) ){
       $address = $info["address"];
     }
     if ( array_key_exists("name", $info) ){
       $name = $info["name"];
     }
-    
-    $result[] = array("address"=>$address,"name"=>$name);
+    if ( array_key_exists("encoding", $info) ){
+      $encoding = $info["encoding"];
+    }
+    $result[] = array("address"=>$address,"name"=>$name,"encoding"=>$encoding);
   }
 
   return $result;
diff -ru /tmp/h/inc/mail/hesk_pipe.php ./inc/mail/hesk_pipe.php
--- /tmp/h/inc/mail/hesk_pipe.php	2012-01-05 14:16:22.529670777 +0200
+++ ./inc/mail/hesk_pipe.php	2012-01-15 00:07:22.335410549 +0200
@@ -58,6 +58,13 @@
 
 /* Variables */
 $tmpvar['name']	    = hesk_input($results['from'][0]['name']) or $tmpvar['name'] = $hesklang['unknown'];
+if (!empty($results['from'][0]['encoding']))
+{
+	if (strtolower($results['from'][0]['encoding']) != strtolower($hesklang['ENCODING']))
+    {
+		$tmpvar['name']=mb_convert_encoding($tmpvar['name'],$hesklang['ENCODING'],$results['from'][0]['encoding']);
+    }
+}
 $tmpvar['email']	= hesk_validateEmail($results['from'][0]['address'],'ERR',0);
 $tmpvar['category'] = 1;
 $tmpvar['priority'] = 3;