Personal Research (XMLTV) Things that may not be obvious

Discussion about Schedules Direct grabber code and data formats.
Post Reply
hnrsoftware
Posts: 2
Joined: Mon Oct 15, 2007 7:28 am

Personal Research (XMLTV) Things that may not be obvious

Post by hnrsoftware »

Context is personal research project using the dd-data output of XMLTV.exe using Windows XP. I will post a few things that I've "discovered" and I invite comments or additional insights. Personally, I'm not wild about XML, but the raw data download (dd_data) file is orders of magnitude easier to work with than screen scraping. I have a few minor quibbles with data normalization, but it is very straightforward to decode.

1. Read the "\doc\man\tv_grab_na_dd." file. Read it several times, study it.... Virtually everything you need to know is there, sometimes subtle, but usually is there.

2. Manual/batch execution of XMLTV.exe is easy and instructive to examine the workings of the various options. I just use .bat files from the command line interface to play with things.

3. The "--download-only" option produces (for me) a 4.5 Mb html-type file in 5-10 seconds (DSL) for the 30 channels or so that I am interested in from my DirectTV lineup that could amount to 300-400 channels of mostly garbage. I suppose I will try some all-channel downloads just to see how bad it is, but I REALLY don't care about most of it. The XML processing that could be allowed to continue from there seems to take about a minute, but this is only useful if you have high-level XML tools to process/display from there. The direction I'm using produces searchable/sortable files in about 5 seconds of additional processing, but this is only reasonable because I already have a lot of support code to handle things in my own format. This makes it about 5 seconds download, 5 seconds processing for 30 channels, 1 week of data.

4. The ".conf" file is defaults to an odd ".xmltv" folder. It is a CRLF text file with essential fields of username:, password:, timezone:, and lineup. The channel: and not channel: lines seem to be only relevant to the further XML processing and can be eliminated.

5. It is your on-line Schedules Direct account that controls what channel data will be downloaded every time (not the config file). I have not tested it, but if I read the doc file right, if you designate multiple lineups online, all the channels you specify in all the lineups will be downloaded every time you make any data request. This needs further examination.

6. The documentation does mention it, but it is not obvious that the Tribune Media Service normal operations will unilaterally change your channel settings in your SD online configuration, adding or changing things it feels like. In the last couple of weeks, I've found 20 or so channels added at random times to my 30 selected channels. This is very annoying from a data processing standpoint. You will need to monitor the channels sent with every data download and hand-edit your SD account accordingly.

7. The dd-data download definitely interacts with cacheing at some point. The first of two identical retrievels takes may 10-15 seconds, the second 5-10 seconds. I can't visualize cacheing on my PC that would make sense, so it must be on the SD or TMS servers that the data is more available.

That is all the brilliant insights I have for the moment. I really like working with this data, and $20/year is nothing compared to the value of the data.

Howard

rmeden
SD Board Member
Posts: 1563
Joined: Tue Aug 14, 2007 2:31 pm
Location: Cedar Hill, TX
Contact:

Re: Personal Research (XMLTV) Things that may not be obvious

Post by rmeden »

Maybe I can clarify a few things you mentioned.

Let me start by saying that tv_grab_na_dd's --dd-data and --download-only tools were meant for debugging purposes. The primary purpose of tv_grab_na_dd is to output data in XMLTV format. But it was trivial to add --download-only and folks requested it, so I did it.
hnrsoftware wrote: 4. The ".conf" file is defaults to an odd ".xmltv" folder. It is a CRLF text file with essential fields of username:, password:, timezone:, and lineup. The channel: and not channel: lines seem to be only relevant to the further XML processing and can be eliminated.
All the XMLTV tools put their config files in a .xmltv folder. ".name" files are hidden by default on Linux/Unix systems (same as windows attrib -h) and are commonly used for config files.

You're correct the channel/not channel entries are not used for --download-only. They are used for local filtering in the XMLTV conversion phase.
hnrsoftware wrote:5. It is your on-line Schedules Direct account that controls what channel data will be downloaded every time (not the config file). I have not tested it, but if I read the doc file right, if you designate multiple lineups online, all the channels you specify in all the lineups will be downloaded every time you make any data request. This needs further examination.
Tribune (via the SD front-end) controls what data is downloaded. And Yes, all lineups are always downloaded from TMS in one pull. The XMLTV format doesn't support multiple lineups so only one lineup is output in XMLTV format per run and the grabber provides local filters for channels. You can use --dd-data and --reprocess to prevent repeated downloads when procesisng multiple lineups. Again, this is only meaningful if you want XMLTV formatted data.
hnrsoftware wrote:6. The documentation does mention it, but it is not obvious that the Tribune Media Service normal operations will unilaterally change your channel settings in your SD online configuration, adding or changing things it feels like.
That's the way Tribune does things and until we (SD) host the data we can't do anything about it. (We are considering it). Again, tv_grab_na_dd in XMLTV mode solves this issue with the "--auto-config ignore" option.
hnrsoftware wrote:7. The dd-data download definitely interacts with cacheing at some point. The first of two identical retrievels takes may 10-15 seconds, the second 5-10 seconds. I can't visualize cacheing on my PC that would make sense, so it must be on the SD or TMS servers that the data is more available.
TMS has a few servers serving the data. In addition since it passes via HTML, your ISP may even be caching some data (although probably not anything this large or in SOAP envelopes)

You may want to consider using the XMLTV formatted data instead of the RAW TMS data. For some applications the relational format is more of a hindrance than a benefit. In addition, you can benefit from other grabbers (if you care) and some features like local filtering (which could be added easily to your code as well of course) Most (if not all) of the TMS data is in the XMLTV file....somewhere.

Robert
SD Board
acting XMLTV lead
tv_grab_na_dd author
(the answers don't get more definitive than this!)

hnrsoftware
Posts: 2
Joined: Mon Oct 15, 2007 7:28 am

Re: Personal Research (XMLTV) Things that may not be obvious

Post by hnrsoftware »

HI Robert - as always, thanks for the definitive clarifications. My point was not to argue for or against XML or the program structure, but merely to clarify points so other developers would not be banging their heads against the wall wondering what they were doing wrong (as I did). Now, mostly, my solutions came from point #1 - READ THE DOCUMENTATION. I consider it a real bonus that you were so helpful when I got stuck.

At this point, I'm not suggesting any changes to the system - I just what to figure out what the current system is. If I can understand the rules, I can adapt to them.

To me, one of the oddest situations is the TMS system adding and deleting station selections to user accounts. It makes sense for any lineup changes to be reflected in a retrieval if you want all channels, but I can't visualize the logic that says they should add stations to a user account that has any stations marked as not wanted. That is neither here nor there, what is, is.

Right after I say I'm not making suggestions, I'm wondering if there would be a possibility for the XMLTV API to perform the fixes, such as "XMLTV tv_grab_na_dd --remove_station NNNNN" It is not that big of a deal to detect the additional stations in the downloaded data and then go logon to SD and manually edit the configuration, but it is just annoying to have to do it.

rmeden
SD Board Member
Posts: 1563
Joined: Tue Aug 14, 2007 2:31 pm
Location: Cedar Hill, TX
Contact:

Re: Personal Research (XMLTV) Things that may not be obvious

Post by rmeden »

hnrsoftware wrote:Right after I say I'm not making suggestions, I'm wondering if there would be a possibility for the XMLTV API to perform the fixes, such as "XMLTV tv_grab_na_dd --remove_station NNNNN" It is not that big of a deal to detect the additional stations in the downloaded data and then go logon to SD and manually edit the configuration, but it is just annoying to have to do it.
While we can't prevent new stations from being added it is now possible to write a client to do the --remove_station call as you describe. It hasn't been implemented, and probably won't be until we switch to locally hosted data. (and then may not be needed since there will be an option to *not* add new stations)

Robert

bcmoney
Posts: 2
Joined: Mon May 02, 2011 11:09 am
Contact:

Re: Personal Research (XMLTV) Things that may not be obvious

Post by bcmoney »

In particular I found that GET requests are not supported so technically the Tribune Web Service must still be following the SOAP 1.1 not SOAP 1.2 standard which specifies both GET and POST are acceptable as long as the SOAP request enveloppe is passed via URL.
Since that is not supported, you have to use POST and it also has to have the BASIC authentication information included in the header in the exact pattern:

Authorization: BASIC xxxxxxxxxxxxxxxxxxxxxxxxx


where:
xxxxxxxxxxxxxxxxxxxx = base64encoded(username:password)

Don't forget the semicolon as a string in between the username and password (like I did at first)!

Before any programming, try debugging using a SOAP tool... it DEFINITELY helps save sanitiy. I got it working using SOAP UI: http://soapui.org/
You need to do the following in SOAP UI:
  • File --> New SoapUI Project
  • Name it "SchedulesDirect" or something the like...
  • Check "Create TestSuite" and "TestCase"
  • Expand the project, Under "xtvdBinding" there should be a "download" Web Service call stub, expand that and open Request 1
  • Click the little "Auth" tab at the bottom of the Request window (easy to miss... look down next to "Headers" and "Attachments" tabs)
  • Enter your SchedulesDirect username and password, then click the little green "Run" arrow at the top left of the Request window
Coolness, this actually returns some XML describing my stations and schedules... now I just have to figure out how to request info on one of the channels or a particular schedule timeblock... update to come as soon as I do.

Post Reply