Microsoft Word converts certain characters into “smart characters”. Double quotes, dashes (em dash / en dash), bullets and so on. These characters break PHP’s XML handling. (or at least they broke it for me – using simplexml_load_string!). How do you clean them? There is an old post that suggests using ereg_replace on a set of […]