Page 1 of 1

programs hash discrepancy

Posted: Thu Feb 16, 2017 2:03 am
by kenty
Currently I am testing the service using the tv_grab_zz_sdjson_sqlite XMLTV grabber for lineup DEU-0002066-X. If you do a second update right after a full update the grabber will check the lastDataUpdate time and not download anything (if older than the last update), which, of course is a good thing.
However, after checking the database to find missing or old programs (according to the hashes in the schedules and the programs tables) it finds some (apparent) discrepancies. It downloads those programs but the data received is identical to what it already had. Today it was 24, yesterday 91 programs with different hashes in the schedules.

(Schedules Direct data last updated on 2017-02-16T03:00:29)

Here are a few examples:

Code: Select all

       schedules table                                                                        ---    programs table
station   day         starttime               duration program         program_hash               program         hash
83621     2017-02-25  2017-02-25 08:30:00.000 2700     EP020034450003  SX7O7Fbp8ViUbPEtrI9y+w --- EP020034450003  SS9/eTmWjsION1u3x7O4mQ
83645     2017-03-08  2017-03-08 18:30:00.000 2700     EP019834270099  QGOA0upOcJZmSR9GPZjBEw --- EP019834270099  oU7IAOkB5IwIzCGWxk5fRA
101806    2017-02-21  2017-02-21 17:35:00.000 3000     EP019560930011  cue1W5Fi6F9o5AWGE/03QA --- EP019560930011  7yaONMOdMUjKl85jWKOdrw
101806    2017-02-21  2017-02-21 23:50:00.000 2700     EP019560930079  Hh38Ei7uHkyQtDcZygsAKw --- EP019560930079  66Hmz1kxSpMG8O7T7PQvew
101806    2017-02-24  2017-02-24 18:25:00.000 2880     EP019560930047  AeDlIT2xPWVZAHb2f3qRDA --- EP019560930047  +8part0I2lMxiylQfPvAGg
Any thoughts?

Update: happens every day and with other lineups as well (just checked ESP-0000965-X; 64 programs)

Re: programs hash discrepancy

Posted: Fri Feb 17, 2017 3:24 pm
by rkulagow
What you're seeing is what happens when the batch generation of schedules on our side collides with an updated program.

At time "t", the schedule for a station is generated, using the MD5 that we have for the programs at that time. At t+foo, but before the next batch process gets run, the program changes, so we refresh it in our table. That doesn't generate a refreshed schedule, because otherwise lots of schedules would be in flux.

So, your program requests programID EP000000000001, but between the time that the schedule was generated, and the time that you downloaded it, something in the program changed upstream. That generates a new internal MD5 for that programID, and that causes the discrepancy.

If it's really going to be an issue, then I can create a new REST endpoint where you request MD5's and get back the same data as you would if you had requested the programID.

I actually looked at your first example when you posted this message, and on the next batch run the schedule MD5 and the program MD5 matched, because there hadn't been any more updates.

Re: programs hash discrepancy

Posted: Fri Feb 17, 2017 11:16 pm
by kenty
Thanks, it's not really an issue.

Sometimes a good explanation is all one needs to make sure that it's not a bug or misunderstanding ;)