Duplicate Check method enhancement....

Post Reply
User avatar
diyhouse
Senior
Posts: 160
Joined: Mon Mar 31, 2014 9:42 am
Great Britain

Duplicate Check method enhancement....

Post by diyhouse » Thu Sep 11, 2014 8:41 am

I would like to propose to the dev community an enhancement to improve duplicate checking of UK programs, ( not sure if this applies to or countries ).

Problem statement:-
Programmes are transmitted/repeated at different times as we know, but for some reason broadcasters will change the descriptive text associated with the same program. This will cause the same program to be recorded twice.

Enhancement Request:-
Is it possible using fuzzy logic comparisons ( or similar technology ) to look for similar words and if a 50% or something match is found to say they are the same programs,.. and not record the second occurrence.


Example of Problem:-

------------ Channel 4 HD
Company director William Reeve works on the shop floor of Irish bookmaker and gaming business Paddy Power. While undercover, he encounters shocking attitudes among some of his staff and meets a team that has experienced several armed robberies, as well as a customer services agent who has to deal with racial abuse and suicide threats. (HDTV, Widescreen, Rerun)
------------Channel 4-Seven HD
Company director William Reeve goes undercover on the shop floor of multi-million-pound gaming business Paddy Power and unearths some shocking problems behind the scenes. (HDTV, Widescreen, AVC/H.264, Surround Sound)
------------

User avatar
dekarl
Developer
Posts: 228
Joined: Thu Feb 06, 2014 11:01 pm
Germany

Re: Duplicate Check method enhancement....

Post by dekarl » Thu Sep 11, 2014 10:43 am

Can you post the entries from the programs table? Looking at their website hints that there are episode titles for this series. If you get the episode title "Paddy Power" you can just detect duplicates using that. But I would have expected to see matching unique programids for repeats on UK TV. (In the latter case duplicate matching defaults to use the programids)
What guide data is that? DVB-EIT or from tv_grab_uk_rt (the Radio Times XMLTV feed) or from tv_grab_uk_atlas (the new Atlas guide feed) or something else?

User avatar
diyhouse
Senior
Posts: 160
Joined: Mon Mar 31, 2014 9:42 am
Great Britain

Re: Duplicate Check method enhancement....

Post by diyhouse » Thu Sep 11, 2014 12:44 pm

The web description for E3S4 gives the following:-
---------- Web page description:-
Company director William Reeve goes undercover on the shop floor of multi-million-pound gaming business Paddy Power.
----------
I use the xmltv feed from Radio Times,.. and many programs across the full line-up when repeated often change the descriptive text, why I do not know,.. as this clearly causes more work for them,.. strange.

User avatar
rwagner
Developer
Posts: 217
Joined: Thu Feb 06, 2014 11:37 pm
United States of America

Re: Duplicate Check method enhancement....

Post by rwagner » Thu Sep 11, 2014 2:24 pm

I'll be honest, in this particular example, without other information like subtitle or program ID, I wouldn't consider that close enough to be a duplicate.

User avatar
diyhouse
Senior
Posts: 160
Joined: Mon Mar 31, 2014 9:42 am
Great Britain

Re: Duplicate Check method enhancement....

Post by diyhouse » Thu Sep 11, 2014 3:10 pm

here is the extra info,... neither have a subtitle entry,.. I don't know why....

And,.. what I don't understand is why they have different IDs,.. when they are actually the same programs.

is the difference here due to HD and SD broadcasts?

------------------------------
Company director William Reeve goes undercover on the shop floor of multi-million-pound gaming business Paddy Power and unearths some shocking problems behind the scenes. (HDTV, Widescreen, AVC/H.264, Subtitles Available, Surround Sound)

Program ID: www.channel4.com/53154/003
Length: 1 hr 4 mins
File Size: 2.5 GB
Recording Group: Default
Playback Group: Default




--------------------------------
Company director William Reeve works on the shop floor of Irish bookmaker and gaming business Paddy Power. While undercover, he encounters shocking attitudes among some of his staff and meets a team that has experienced several armed robberies, as well as a customer services agent who has to deal with racial abuse and suicide threats. (HDTV, Widescreen, AVC/H.264, Subtitles Available, Stereo)

Category: documentary
Episode Number: E3S4
Program ID: EP10986601934
Directed by: Lucy Bing
Length: 1 hr 4 mins
File Size: 4.5 GB
Recording Group: Default
Playback Group: Default

User avatar
dizygotheca
Developer
Posts: 225
Joined: Wed Sep 03, 2014 9:02 am
Great Britain

Re: Duplicate Check method enhancement....

Post by dizygotheca » Fri Sep 12, 2014 12:54 am

The first Program Id (4-seven) is a CRID coming out of EIT data; the second (C4) is a Myth-generated id from XML guide data.

That also explains why the descriptions are different.

If you're intentionally mixing EIT & XML then Myth will always consider the 2 channels to be broadcasting different programmes/episodes, as the program ids from each source are different & incompatible. Title matching won't/can't occur because both programmes have non-generic ids.http://www.mythtv.org/wiki/Duplicate_matching

If you thought your 4-Seven guide was also coming from RT XML then you've mistakenly still got EIT turned on, which is overwriting the XML guide data: not good!

If you're using EIT because RT doesn't provide 4-Seven HD data (as RT doesn't provide new channels) then you can switch to the new Atlas grabber https://github.com/honir/tv_grab_uk_atlas that provides all channels. You'll have to install the latest xmltv manually though http://sourceforge.net/projects/xmltv/ as it hasn't reached the repositories yet.

User avatar
diyhouse
Senior
Posts: 160
Joined: Mon Mar 31, 2014 9:42 am
Great Britain

Re: Duplicate Check method enhancement....

Post by diyhouse » Fri Sep 12, 2014 9:13 am

Yes your right,... I was looking at my mythweb channel stuff yesterday,.. and noticed that 4Seven HD had no XMLTV reference,.. I duly filled in the missing data and updated my channel-org script,.. but I did not connect this duplicate issue together.

I will look at the new atlas grabber,.. as the RT misses other channels as well....

My myth understanding continues to grow,.. thanks for the education.....

User avatar
diyhouse
Senior
Posts: 160
Joined: Mon Mar 31, 2014 9:42 am
Great Britain

Re: Duplicate Check method enhancement....

Post by diyhouse » Mon Sep 15, 2014 3:39 pm

Further to the comments about the atlas grabber,...

What are the pro's and cons of shifting to atlas now,.. are progIDs compatible,.. or will I loose all concept of previously recorded programs.... which have been painstakingly filtered over the past months or so....

Can I do:-
add-apt-repository http://sourceforge.net/projects/xmltv/
sudo apt-get install xmltv

or is it case of copying the files over, decompressing,.. then doing a configure, make, make install,.. etc and then run it....

When is this stuff expected to officially kick over in to the main repositories build?

thanks

User avatar
dizygotheca
Developer
Posts: 225
Joined: Wed Sep 03, 2014 9:02 am
Great Britain

Re: Duplicate Check method enhancement....

Post by dizygotheca » Mon Sep 15, 2014 10:59 pm

Yes, the ProgIds are compatible - your duplicate matching will still work. They're actually generated by Myth from the title/subtitle/season/episode I believe. There may be the odd hiccup because I think the RT grabber used to clean up some title/subtitles ("fix-ups") whereas the Atlas one will not but generally you shouldn't notice any difference, other than getting data for every TV channel & radio too.

You cannot use apt-get yet as there is no xmltv repository that I am aware of. I have no idea when the new xmltv will get into the distribution ones. If you can wait then this is obviously the easiest way.

Otherwise, you'll already have a xmltv package installed (0.5.63-2?) but Atlas needs 0.5.64 so the only way currently to use it is;

1. Uninstall your existing xmltv packages (xmltv, xmltv-gui, xmltv-util, libxmltv-perl)
2. Download 0.5.65 and make as per the README. It's quite simple, although you may need to satisfy some other minor dependencies along the way.
3. To use Atlas you need to get a (free) key from metabroadcast, which the grabber requires as part of its configuration. This a real pain because their website/API is a nightmare IMHO. And you can only log in from a Twitter, Google or Facebook account.
It's done from http://metabroadcast.com/blog/create-an ... as-api-key. And as the grabber INSTALL (https://github.com/honir/tv_grab_uk_atl ... er/INSTALL) states:

"Remember to request Press Association (PA) as the content provider on your Atlas API key."

Once you've got the magic key, you never need to go near it again (I hope!). One day I suspect the RT feed will break/be turned off/become too expensive to maintain, so better you migrate at a time of your choosing than theirs...

User avatar
diyhouse
Senior
Posts: 160
Joined: Mon Mar 31, 2014 9:42 am
Great Britain

Re: Duplicate Check method enhancement....

Post by diyhouse » Fri Sep 26, 2014 6:55 pm

Yep!!,... That is one seriously Painful website / user interface,.. it rates as one of my top ten worst!! Ahhhhh

Anyway,.. I have got a key,.. eventually,.. with PA, BBC, and UK Channel Line-ups enabled,.... totally hit and miss getting there mind you... :-)

What I am struggling with, before I go the whole hog is why BBC HD does not work with my key,...

The only nice thing I can compliment the site on is that one can test ones keys to see if they work....

Mine works fine for all the test channels except BBC HD, .. any ideas,.. does your key behave the same on the front URL test page??

Haven't got there yet,.. but how does atlas handle the two sources... do I just rename the output of the -configure run and call it Video_source.xmltv in the .mythtv folder... just like the RT version....

We can but hope someone looks at the atlas key interface....
thanks

User avatar
dizygotheca
Developer
Posts: 225
Joined: Wed Sep 03, 2014 9:02 am
Great Britain

Re: Duplicate Check method enhancement....

Post by dizygotheca » Fri Sep 26, 2014 11:12 pm

The website "BBC HD"/cbbV doesn't work for me either - I think it's the wrong code. I don't use HD but it doesn't match anything from

Code: Select all

grep -e BBC.*HD ~/.xmltv/supplement/tv_grab_uk_atlas/tv_grab_uk_atlas.map.channels.conf
Yes: just copy/link/rename the tv_grab_uk_atlas.conf to ~mythtv/.mythtv/Video_source.xmltv

User avatar
diyhouse
Senior
Posts: 160
Joined: Mon Mar 31, 2014 9:42 am
Great Britain

Re: Duplicate Check method enhancement....

Post by diyhouse » Mon Nov 17, 2014 3:58 pm

Well not sure I've done and configured correctly,. it just seemed too simple...

Removed existing xmltv via graphic tool.
downloaded and installed latest xmltv 0.5.66 ( followed INSTALL instructions ) "piece of cake",.. ( glory before the fall )
downloaded and installed atlas,... note the hidden file structure for the 4 ...conf files....
(I did backup my Radio Time VideoSource files before over-writing them,.. just in case )
ran atlas configure,.. just took all channels for my area.
copied files to .myth/video_source_name.xmltv

configured backend video sources to use atlas,..

ran mythfilldatabase,.. ( took forever, loads of channels not found,.. but I don't use everything ), so I guess not unreasonable.

restarted Mythbackend and frontend,.. reviewed prog. guide,.. its populated with text and info.... looks good,..

Is it really that simple,.. I don't believe my luck... do I need to change xmltv references?? I assume not

I assume atlas handles all the name translations etc....

fingers crossed....

User avatar
diyhouse
Senior
Posts: 160
Joined: Mon Mar 31, 2014 9:42 am
Great Britain

Re: Duplicate Check method enhancement....

Post by diyhouse » Wed Nov 19, 2014 5:40 pm

Things were to easy,... although programme data is being downloaded system is using EIT as well as xmltv data...

Something I have noticed is as I said previously I am using the 'old' xmltv references... Having looked at the reference list for this info I see it relates specifically to Radio Times data.

Can anyone point me towards the atlas / PA official list for xmltv references ( assuming there is one ),.. I assume there must be some corrections with the tsod.xxxxxx prefix stuff as I believe this is a RT fudge fix.

Can someone tell me what ties myth channels to what atlas searches,... I assume xmltv references, tied into my location (postcode/area) stuff when the configuration step is run,... So how do I know which references to remove from the video-source.xmltv file, as they seem to change based on location, ( but maybe wrong here ).

Any additional reference stuff on atlas welcome.
Many thanks

User avatar
diyhouse
Senior
Posts: 160
Joined: Mon Mar 31, 2014 9:42 am
Great Britain

Re: Duplicate Check method enhancement....

Post by diyhouse » Fri Nov 21, 2014 12:58 pm

As a means of trying to move this forward,. I will essentially close this thread here as I feel I have exhausted the original question, and open a general discussion thread.

Tx to folks that have responded here though.

Post Reply