Frequency of daily-schedule MD5 changes

Use this forum to discuss issues related to the SD-JSON service.
Post Reply
TPeterson
Posts: 12
Joined: Mon Oct 13, 2014 11:43 am

Frequency of daily-schedule MD5 changes

Post by TPeterson »

My app, CW_EPG has used the XTVD data for many years, but I've been updating it to use the SD-JSON service so that it could have wider geographical utility. Going in, I expected that the daily schedules would have a low frequency of change, but (aside from a recent period of over 2 days with zero changes and no new data--a glitch, I assume) I've been seeing 40%-70% of the channel-days in several lineups change in 24 hours. What is known about these many frequent changes? Do we really have to re-download half of our schedules every day to stay current??

kyl416
Posts: 18
Joined: Tue Feb 05, 2013 12:59 pm
Location: Tobyhanna, PA
Contact:

Re: Frequency of daily-schedule MD5 changes

Post by kyl416 »

Some of it is because the episode descriptions don't get sent out by the networks until a few days prior, like soaps and talk shows. While others are because channels that don't stick to :00 and :30 start times only provide the up to the minute airtimes a few days in advanced. Plus these days there are a lot of last minute changes to schedules, like the kids channels and PBS stations making weekly changes as more states cancel school for the rest of the year.

SD-JSON also provides more days than SD-DD does, so that extra week of data changes frequently as Gracenote receives the schedules and descriptions from the various networks, distributors and production companies.

In many cases, especially on cable channels, it's just a change in airtimes while the underlying program descriptions are unchanged. If you haven't already, you should implement some kind of caching to store the data locally and keep track of these md5 changes so you only download what's changed and retrieve everything else from the cache. The docs/wiki explain it better, but you should check for MD5 changes on the stations, redownload the schedules for those changed days to get updates, and then only download programs that have MD5 changes.

TPeterson
Posts: 12
Joined: Mon Oct 13, 2014 11:43 am

Re: Frequency of daily-schedule MD5 changes

Post by TPeterson »

Yes, I am caching the MD5 values. That's how I know that they've been changing so much! I'm also tracking downloads of the PIDs, which, as expected, are not changing so much. The changed MD5 values are not generally clustered toward one end or another of the data timeframe, which is another thing that's puzzling. I was expecting only to have to update daily a few percent of the previously downloaded schedules, but the changing MD5 values are forcing me to re-fetch roughly half of my schedules every day. :x :shock:

Is there truly no way to do better? If not, I'm going to have to revert to the previous scheme of updating only a portion of the data, say "today, tomorrow, and any new days", but I did not see this coming. :(

kyl416
Posts: 18
Joined: Tue Feb 05, 2013 12:59 pm
Location: Tobyhanna, PA
Contact:

Re: Frequency of daily-schedule MD5 changes

Post by kyl416 »

Yeah, that's what some other applications do for updates, today, tommorow and new days.

The thing is if ANYTHING changes it generates a new md5 hash. Like if a specific episode gets an updated detail the md5 hash for its programID changes, the md5 hash for every day that episode airs changes to alert apps/grabbers that something needs to be updated.

For example, CBS is doing classic theme weeks for their soaps since they ran out of new episodes. They only released the list of episodes for next week about an hour or two ago, so whenever that propagates down, the md5 hashes for all the CBS affiliate schedules will change for next week. And those md5 hashes will change again whenever The Talk, Colbert and Corden release their guest lists for next week.

gtb
Posts: 107
Joined: Thu Oct 02, 2014 2:07 pm

Re: Frequency of daily-schedule MD5 changes

Post by gtb »

TPeterson wrote:
Thu May 14, 2020 2:41 pm
Is there truly no way to do better?
What is your ultimate goal? Reducing download size/time, or providing the most accurate data available to your app(s) for individuals (or for program scheduling) purposes? Trying to thread the needle by doing a bit of both is possible, but will be the wrong choice some of the time for some of the people.

In my experience one can expect the daily schedule to change (in one way or another) regularly, while the actual referenced program details change far less often (which are also checksumed so you only have to download specifically updated programs) such that the amount of data downloaded to maintain accurate data for all the available days tends to be modest for most lineups, but it is clearly more than if you decide to only update one/two days at a time.

TPeterson
Posts: 12
Joined: Mon Oct 13, 2014 11:43 am

Re: Frequency of daily-schedule MD5 changes

Post by TPeterson »

Thanks for the perspectives. I had hoped that providing daily completely accurate updates would not require so much data so that one could maintain very extensive lineups without extremely long connection times. Evidently that's not in the cards so I'll have to decide how to compromise. I'm thinking now that it may be best to allow users to choose a maximum download time and then tailor the daily updates within that constraint. If the user wishes 100% accuracy then she'll either have to live with a limited lineup or a potentially long download. Using the XTVD data, CW_EPG always just used the "today, tomorrow, and last day" scheme for manageable times. I set our sights higher for this more-granular data source, but evidently too high.

TPeterson
Posts: 12
Joined: Mon Oct 13, 2014 11:43 am

Re: Frequency of daily-schedule MD5 changes

Post by TPeterson »

After further work on this, I've sped things up enough to handle a reasonably large lineup with 100% daily updating. I'm still marveling over the frequency of the changes.

gtb
Posts: 107
Joined: Thu Oct 02, 2014 2:07 pm

Re: Frequency of daily-schedule MD5 changes

Post by gtb »

TPeterson wrote:
Sat Jun 06, 2020 10:06 am
After further work on this, I've sped things up enough to handle a reasonably large lineup with 100% daily updating. I'm still marveling over the frequency of the changes.
As others have said, talk to the upstreams upstreams (the networks, stations, etc.) if you desire less churn. They control the data. The upstream guide provider (Gracenote) just tries to collect and curate the data as they get it and make it available to Schedules Direct for distribution to "us". I have just come to expect constant churn.

Post Reply