XML Data appears to be invalid

Discussion about Schedules Direct grabber code and data formats.
johnsonsmythe
Posts: 13
Joined: Thu Sep 06, 2007 2:21 am

XML Data appears to be invalid

Post by johnsonsmythe »

I am getting foreign characters in my program titles and descriptions but these are illegal XML characters AFAIK, they should be encoded using the &#xxx; method. This is breaking the my parsing (which uses expat) so I cant do anything without this being fixed at source.

This appears in all the lineups I have tested so far, but for your info ...: zip: 90210, Dish Network.

E.g.:
<program id='SH000175460000'>
<title>Linha de Três</title>
<showType>Series</showType>
<series>EP00017546</series>
<originalAirDate>2007-09-30</originalAirDate>
</program>

<member>
<role>Actor</role>
<givenname>Antônio</givenname>
<surname>Pompêo</surname>
</member>

rmeden
SD Board Member
Posts: 1579
Joined: Tue Aug 14, 2007 2:31 pm
Location: Cedar Hill, TX
Contact:

Re: XML Data appears to be invalid

Post by rmeden »

Are you still seeing this? I did earlier, but it doesn't seem to be doing it now.

Either it didn't effect all the TMS servers, or they've fixed the problem.

It has happened before. XMLTV/tv_grab_na_dd has a "--dropbadchar" switch to get around it, but it's pretty CPU intensive so I don't do it unless requested.

Robert

johnsonsmythe
Posts: 13
Joined: Thu Sep 06, 2007 2:21 am

Re: XML Data appears to be invalid

Post by johnsonsmythe »

It appears to have been just fixed, as it wasnt working when I posted but it is okay now. Perhaps, as you say, there is one server that is out of sync with the others.

cwarren
Posts: 17
Joined: Fri Aug 17, 2007 5:50 am

Re: XML Data appears to be invalid

Post by cwarren »

Still seeing this kind of data in my pulls.....is this how the data is going to be?

excerpt from a pull about an hour or so ago....is the accented e in Beyoncé legal utf-8? from what I can find it isn't. XML-Parser doesn't like it.

<program id='EP0168271807'>
<title>Biography</title>
<subtitle>Beyonce</subtitle>
<description>Singer Beyoncé makes acting success her next goal.</description>
<showType>Series</showType>
<series>EP016827</series>
<originalAirDate>2004-11-21</originalAirDate>
</program>

neopelago
Posts: 16
Joined: Fri Aug 17, 2007 9:51 am

Re: XML Data appears to be invalid

Post by neopelago »

It has been my understanding that the "raw" listings I got from Zap2it and now thru XMLTV were always in utf-8 format, not strict XML. And by specifying utf-8 for Java input file streaming, special characters have been converted into appropriate unicode characters without any problem. The only XML entities used in these listings are the most basic -- & &apos; &quot. It might be worth asking if Data Direct is planning to move in the future closer to real XML, in this and other things.

Larry

GameGod
Posts: 63
Joined: Fri Aug 17, 2007 12:26 pm

Re: XML Data appears to be invalid

Post by GameGod »

This problem doesn't appear to be fixed. I'm now getting these failures every other day or so. Is this going to be fixed at all?

rmeden
SD Board Member
Posts: 1579
Joined: Tue Aug 14, 2007 2:31 pm
Location: Cedar Hill, TX
Contact:

Re: XML Data appears to be invalid

Post by rmeden »

GameGod wrote:This problem doesn't appear to be fixed. I'm now getting these failures every other day or so. Is this going to be fixed at all?
Need more info to replicate the issue.

I just searched zip: 90210, Dish Network. mentioned by johnsonsmythe and couldn't find the programme or series ID he mentioned.

Robert

cwarren
Posts: 17
Joined: Fri Aug 17, 2007 5:50 am

Re: XML Data appears to be invalid

Post by cwarren »

Zipcode 32578
Cox Digital

rmeden
SD Board Member
Posts: 1579
Joined: Tue Aug 14, 2007 2:31 pm
Location: Cedar Hill, TX
Contact:

Re: XML Data appears to be invalid

Post by rmeden »

more info.... please provide everything I need to replicate this.

lineup ( zip + name)
channel
date/time
bad text

(johnsonsmythe's post is a good example(

cwarren
Posts: 17
Joined: Fri Aug 17, 2007 5:50 am

Re: XML Data appears to be invalid

Post by cwarren »

Ahh, I thought the two posts together accomplished that....

Post Reply