I didn't even think that through.rkulagow wrote:No. The schedule file itself is fairly lightweight, since it doesn't contain any real program details. The program files are what would make up the bulk of the transfer, and if there are a lot of repeats on a schedule then you only download the prog_id a single time.
I was completely unclear there... my thinking was that each one individually be a gzip stream. Here's the process as it is now:rkulagow wrote:That was investigated, but it took way too long to assemble everything. Since the client is supposed to download the schedule first, determine which programs it needs and then request only those program, we can't really bundle into one file.
Download lineup -> Unzip -> Read in file(s) 1 by 1 -> Process
Download schedule -> Unzip -> Read in file(s) 1 by 1 -> Process -> Determine programs needing update
What my thought was:
Download lineup (single gzip stream) -> Process
Download schedule (single gzip stream) -> Process -> Compare md5 key to one in database -> If changed:
Download programs (single gzip stream with X programs) -> Process -> Repeat until processed
The unzipping and file read-ins seem to make the process pretty I/O heavy. Obviously the client writers would need to take care downloading program data, since that is where the build of this comes from. But even if it didn't work for programs, for whatever reason, if the lineup and schedule are light weight (which they should be) there's really no reason to add unzipping as a part of the process
The only reason I brought that up is because there is a ton of cross over data between SD's programs and TVDB's series data. Giving the TVDB data in the schedule would let the client which source to use for episode/program meta.rkulagow wrote: Maybe, if you make your case strong enough.
The design is starting to make more sense now, and given the schedule data is so small, it seems moot to download this daily to look for schedule changes.hall5714 wrote: You won't get stale data. Each time you get the sched file for a stationID, it will contain the next 12-14 days worth of programs that are on the stationID. If any program gets updated (say a new guest star is added, or metadata is updated, or whatever) on "today + 4 days", then while the prog_id stays the same, the MD5 will change. The client should look for that. In the same way, if you've downloaded a program and it's in your database, and the program comes up again on the schedule, but the MD5 is the same, then you don't need to download it again.
The schedules are pre-generated, so no, can't make them a shorter duration; each stationID will have the next 12-14 days.
I guess the only thing that really leaves is a question of whether or not we could get lineup data in 1 file and schedule data in 1 file. Whether or not they are gzipped or zipped is moot I suppose (since most languages can handle those in-place, but for a single file gzip is really the way to go), but a single file vs multiple files would be a great deal easier to process in place... and as such, a great deal easier to make a Python module, without obligating callback handling on many files.
Thanks for the response and clarification!