Schrijver
| RSS improvements!
|
Grauw msx professional Berichten: 1002 | Geplaatst: 26 Juni 2004, 02:11   |
Yes! Wonderful! Great!
p.s. I think it'd be better if you didn't translate hr's (inside quotes) to long lines of underscores  .
~Grauw |
|
Grauw msx professional Berichten: 1002 | Geplaatst: 26 Juni 2004, 02:21   |
|
|
snout
 msx legend Berichten: 4991 | Geplaatst: 26 Juni 2004, 02:43   |
Ah I'll fix those inside quote thingies euhm.. probably sunday  |
|
Grauw msx professional Berichten: 1002 | Geplaatst: 29 Juni 2004, 02:48   |
Two things:
1. The RSS for the reactions on news items don't show the topic but a short excerpt version of the content instead (weird? it works ok for photos).
2. The main link of the MRC Forum RSS doesn't point to the forum but to the frontpage... It'd be nice...
~Grauw
|
|
snout
 msx legend Berichten: 4991 | Geplaatst: 29 Juni 2004, 14:02   |
I've put it on the fixlist  |
|
snout
 msx legend Berichten: 4991 | Geplaatst: 01 Juli 2004, 12:52   |
I'm in quite a hurry, but I think I've just fixed it! If not, later this week!  |
|
Grauw msx professional Berichten: 1002 | Geplaatst: 01 Juli 2004, 20:06   |
Yes, it seems to work now.
Another thing, please escape the & ampersants like this: &. Not only in the element bodies, but also in their attributes (href=, etc). This is required by XML and it breaks my RSS reader if it isn't done. Right now I can only access the Reactions feed, the other two show up as exclamation marks. That's enough to confirm that you fixed the reactions heads, but it'd be nice if I could also look at the others  .
PHP function for that: htmlentities($text, ENT_NOQUOTES, 'UTF-8');. ENT_NOQUOTES specifies that "s should not be escaped (not necessary for normal text inside tags but use ENT_COMPAT (default) if you want them escaped in case of for example attributes. Or just use htmlentities($text) if you don't want to specify a character encoding. Ahwell: http://nl.php.net/manual/en/function.htmlentities.php (will probably explain it better than I do ;p).
~Grauw |
|
snout
 msx legend Berichten: 4991 | Geplaatst: 03 Juli 2004, 00:59   |
ehrm, I -have- coded in PHP before, you know. I just forgot to add the html-entities thingie in the RSS-bbcode parser, that's all. It should work now.
|
|
Grauw msx professional Berichten: 1002 | Geplaatst: 03 Juli 2004, 17:58   |
great!
(who knows, maybe you never encoded entities... bad practice, but it'll usually work) |
|
Grauw msx professional Berichten: 1002 | Geplaatst: 09 Juli 2004, 01:40   |
Arrh, my RSS reader gives an XML validation error again on the MRC News RSS (and doesn't work anymore for it)... This time the culprit is: 'Miscelánea'
XML doesn't have all those HTML character entities. Only &, < and > (and perhaps ", I am not sure, would have to look it up). So you should either use unicode or the appropriate numerical escape (that would be E1;, I think) instead...
~Grauw "making your XML life just a little harder by having an RSS reader which validates ;p"
(or rather, is built upon a 'strictly' validating XML engine)
|
|
[D-Tail]
 msx guru Berichten: 2991 | Geplaatst: 09 Juli 2004, 10:23   |
Shouldn't the latter, the strictness, be determined by the DTD? Then your RSS reader would suck, because it cannot read DTDs  |
|
Vincent van Dam msx addict Berichten: 372 | Geplaatst: 09 Juli 2004, 11:09   |
Quote:
| XML doesn't have all those HTML character entities. Only &, < and > (and perhaps ", I am not sure, would have to look it up). So you should either use unicode or the appropriate numerical escape (that would be E1;, I think) instead...
|
You could declare all these entities in the DTD however, making them valid.
To avoid problems &, < and (in a lesser extend) with > you can also tell the data within the element is cdata. For example;
<element>Nuts & amp; Milk</element> is the same as:
<element><![CDATA[Nuts & Milk]]</element>
If the data would be:
<element><![CDATA[Miscel& aacute;nea]]</element>
the RSS reader shouldn't have any problems withit, but it would show "Miscel& aacute;nea" (garbage in, garbage out). |
|
IC msx professional Berichten: 538 | Geplaatst: 16 Juli 2004, 16:25   |
You could also convert it to normal text like:
& = and
< = less
> = greater then
I don't know what will happen with &euro (€) or ü (ü) for instance though. Is that also not valid xml?
Euh.. (ü) results into a smiley  |
|
Grauw msx professional Berichten: 1002 | Geplaatst: 13 Augustus 2004, 16:15   |
The following forum thread broke my RSS reader again:
http://www.msx.org/forumtopicl3636.html
The problem is the apostrophe in the title: "Sony´s HERCULES cart". This is written in the RSS feed as '´', which is a nonexisting entity in XML. XML only has the following five entities:
& ( & ), < ( < ), > ( > ), " ( & quot; ) and ' ( ' ).
The title doesn't use an ' here which could be expressed as ', but an 'accent acute', aka single quote right. The way to solve this is to use a character reference to it with the unicode for it: & #8217; (decimal) or & #x2019; (hex). This goes for any character not in the list of 5 mentioned above which you would need to define as an entity because it is not in the page's character set. Well, actually the best way would be to use Unicode (more specifically: UTF-8) for msx.org, but that may be a change a bit too drastic. OTOH, php has this nice Latin-1-to-UTF-8 function (in its XML module), so perhaps you could serve only the RSS feed as UTF-8.
[D-Tail] mentioned the DTD but that doesn't apply here as the feed doesn't have one. Declaring a DTD with entities would theoretically also solve the problem, were it not that Mozilla doesn't handle externally defined entities very well, so in effect it still wouldn't work. Besides, creating a DTD is a lot of work.
Anyways, this is not just my RSS reader's 'problem' (RSS is XML, so it's really not)... The problem with the current feed is that sometimes it is XML well-formed, and other times it is not. Therefore, the RSS feed can often not be read with an XML reader *not* based on Microsoft IE's XML parser which is very flexible. This includes amongst others Sage for Firefox and Forumzilla for Thunderbird, but also any site wanting to add a 'latest msx.org items' widget.
Anyways, as far as I can see the only issue left with the MRC's feed seems to be this one about character entities, so I hope you'll fix that and make it work always . At least it would also fix the broken newsfeed, which breaks on the text "Asociación Amigos del MSX" having an ó in it. Quite weird actually, because it does not *have* to be escaped, the character is in Latin-1 (the frontpage doesn't have it escaped either).
~Grauw
p.s. while you're at it, why not make your site XHTML Strict compliant and serve it as application/xhtml+xml to supporting browsers... uh... nevermind ;p. |
|
[D-Tail]
 msx guru Berichten: 2991 | Geplaatst: 13 Augustus 2004, 16:38   |
I've tried suggesting the latter a long time ago, Grauw, but so far the MRC admin team hasn't even been able to make it html4.0 transitional compliant :\
|
|
|
|
|