charset encoding problem with EIT

Have a MythTV related problem? Ask for help from other MythTV users here.

Moderator: Forum Moderators

Post Reply
renatoriolino
Newcomer
Posts: 6
Joined: Thu Jun 18, 2015 5:06 pm
Brazil

charset encoding problem with EIT

Post by renatoriolino » Thu Jun 18, 2015 5:19 pm

Hello,

This is my first post here. My name is Renato and I'm from Brazil. I have switched to MythTV a couple of days ago, previous I was using tvheadend.

My system is basically a Ubuntu Linux with a dual DVB-S card running Mythtv backend 0.27 (fetched from git and compiled with the unicable patch). The front ends are 4 raspberry pi over my home running OpenElec with the MythTV PVR plugin.

I am having some troubles with accented characters from my DVB-S broadcaster (sat is StarOne C2). I think MythTv is not recognizing the correct charset encoding. I have googled about this and found a similar problem with a guy from Hungria (I guess) but neither my english and my technical knowledge was enough to solve the problem (he was askded to do a dvbsnoop and then make some changes on the source code).

Some things to notice:
- With tvheadend all the accents were corrected detected and displayed
- The problem occours with mythtv frontend too (actually I checked on the MySql DB (that is UTF8) and the accents are wrong there too)

I have made a dvbsnoop calling: dvbsnoop -n 2 -nph 0x12

And this is the important (I guess) part:

DVB-DescriptorTag: 77 (0x4d) [= short_event_descriptor]
descriptor_length: 46 (0x2e)
ISO639_2_language_code: por
event_name_length: 41 (0x29)
event_name: "Motoqueiro Fantasma: Esp<ED>rito de Vingan<E7>a" -- Charset: Latin alphabet
text_length: 0 (0x00)
text_char: ""

The full output of the above command can be checked here: http://rbr.no-ip.com:8022/snoop

Let me know if you guys need any more info.

Thanks

Renato

User avatar
dekarl
Developer
Posts: 228
Joined: Thu Feb 06, 2014 11:01 pm
Germany

Re: charset encoding problem with EIT

Post by dekarl » Fri Jun 19, 2015 9:15 pm

renatoriolino wrote:I am having some troubles with accented characters from my DVB-S broadcaster (sat is StarOne C2). I think MythTv is not recognizing the correct charset encoding.
...
And this is the important (I guess) part:

DVB-DescriptorTag: 77 (0x4d) [= short_event_descriptor]
descriptor_length: 46 (0x2e)
ISO639_2_language_code: por
event_name_length: 41 (0x29)
event_name: "Motoqueiro Fantasma: Esp<ED>rito de Vingan<E7>a" -- Charset: Latin alphabet
text_length: 0 (0x00)
text_char: ""

The full output of the above command can be checked here: http://rbr.no-ip.com:8022/snoop
It appears that your provider just set original_network_id to 1, which is registered for Astra °19.2E. This is a problem, as we key our guide fixes to original_network_id/transport_id/service_id.
Can you post a hexdump of the event and also a hexdump of the correct title/episode title/description.


I started to add support for "items" in the transmitted guide, can you post one or two example events including hexdump somewhere? What codes are used for "item_description"? Are the names always such a nice comma separated list?
dvbsnoop wrote: item_description_length: 6 (0x06)
item_description: "Actors" -- Charset: Latin alphabet
item_length: 90 (0x5a)
item: "Nicolas Cage, Violante Placido, Ciarán Hinds, Johnny Whitworth, Idris Elba, Fergus Riordan" -- Charset: Latin alphabet

renatoriolino
Newcomer
Posts: 6
Joined: Thu Jun 18, 2015 5:06 pm
Brazil

Re: charset encoding problem with EIT

Post by renatoriolino » Wed Jun 24, 2015 11:48 am

Sorry for the long delay to answer. I was away from home and only now I'm back.

Yes, my provider is sending network id as 1:

Code: Select all

------------------------------------------------------------
SECT-Packet: 00000001   PID: 18 (0x0012), Length: 174 (0x00ae)
Time received: Wed 2015-06-24  08:21:17.836
------------------------------------------------------------
PID:  18 (0x0012)  [= assigned for: DVB Event Information Table (EIT)]

Guess table from table id...
EIT-decoding....
Table_ID: 80 (0x50)  [= Event Information Table (EIT) - actual transport stream, schedule]
section_syntax_indicator: 1 (0x01)
reserved_1: 1 (0x01)
reserved_2: 3 (0x03)
Section_length: 171 (0x00ab)
Service_ID: 86 (0x0056)  [=  --> refers to PMT program_number]
reserved_3: 3 (0x03)
Version_number: 3 (0x03)
current_next_indicator: 1 (0x01)  [= valid now]
Section_number: 192 (0xc0)
Last_Section_number: 200 (0xc8)
Transport_stream_ID: 54 (0x0036)
[b]Original_network_ID: 1 (0x0001)  [= Astra Satellite Network 19,2<B0>E | Soci<E9>t<E9> Europ<E9>enne des Satellites][/b]
Segment_last_Section_number: 192 (0xc0)
Last_table_id: 80 (0x50)  [= Event Information Table (EIT) - actual transport stream, schedule]
Is it possible to force/override the network_id via config file?

Here is a hexdump of an event:

Code: Select all

00001690  35 29 0a 20 20 20 20 20  20 20 20 20 20 20 20 65  |5).            e|
000016a0  76 65 6e 74 5f 6e 61 6d  65 3a 20 22 49 67 72 65  |vent_name: "Igre|
000016b0  6a 61 20 49 6e 74 65 72  6e 61 63 69 6f 6e 61 6c  |ja Internacional|
000016c0  20 64 61 20 47 72 61 [b]e7[/b]  61 20 64 65 20 44 65 75  | da Gra[b].[/b]a de Deu|
000016d0  73 22 20 20 2d 2d 20 43  68 61 72 73 65 74 3a 20  |s"  -- Charset: |
000016e0  4c 61 74 69 6e 20 61 6c  70 68 61 62 65 74 0a 20  |Latin alphabet. |
In the place of e7 the correct would be "ç" (c3 a7).

Here is some item examples:

Code: Select all

            DVB-DescriptorTag: 78 (0x4e)  [= extended_event_descriptor]
            descriptor_length: 254 (0xfe)
            descriptor_number: 0 (0x00)
            last_descriptor_number: 1 (0x01)
            ISO639_2_language_code:  por
            length_of_items: 172 (0xac)

               item_description_length: 6 (0x06)
               item_description: "Actors"  -- Charset: Latin alphabet
               item_length: 100 (0x64)
               item: "Jeremy Irons, Jack Huston, M<E9>lanie Laurent,, Martina Gedeck, Tom Courtenay, August Diehl, Bruno Ganz"  -- Charset: Latin alphabet

               item_description_length: 8 (0x08)
               item_description: "Director"  -- Charset: Latin alphabet
               item_length: 12 (0x0c)
               item: "Bille August"  -- Charset: Latin alphabet

               item_description_length: 4 (0x04)
               item_description: "year"  -- Charset: Latin alphabet
               item_length: 4 (0x04)
               item: "2013"  -- Charset: Latin alphabet

               item_description_length: 7 (0x07)
               item_description: "country"  -- Charset: Latin alphabet
               item_length: 23 (0x17)
               item: "Alemanha/Portugal/Sui<E7>a"  -- Charset: Latin alphabet
and the hexdump

Code: Select all

00000800  20 20 44 56 42 2d 44 65  73 63 72 69 70 74 6f 72  |  DVB-Descriptor|
00000810  54 61 67 3a 20 37 38 20  28 30 78 34 65 29 20 20  |Tag: 78 (0x4e)  |
00000820  5b 3d 20 65 78 74 65 6e  64 65 64 5f 65 76 65 6e  |[= extended_even|
00000830  74 5f 64 65 73 63 72 69  70 74 6f 72 5d 0a 20 20  |t_descriptor].  |
00000840  20 20 20 20 20 20 20 20  20 20 64 65 73 63 72 69  |          descri|
00000850  70 74 6f 72 5f 6c 65 6e  67 74 68 3a 20 32 35 34  |ptor_length: 254|
00000860  20 28 30 78 66 65 29 0a  20 20 20 20 20 20 20 20  | (0xfe).        |
00000870  20 20 20 20 64 65 73 63  72 69 70 74 6f 72 5f 6e  |    descriptor_n|
00000880  75 6d 62 65 72 3a 20 30  20 28 30 78 30 30 29 0a  |umber: 0 (0x00).|
00000890  20 20 20 20 20 20 20 20  20 20 20 20 6c 61 73 74  |            last|
000008a0  5f 64 65 73 63 72 69 70  74 6f 72 5f 6e 75 6d 62  |_descriptor_numb|
000008b0  65 72 3a 20 31 20 28 30  78 30 31 29 0a 20 20 20  |er: 1 (0x01).   |
000008c0  20 20 20 20 20 20 20 20  20 49 53 4f 36 33 39 5f  |         ISO639_|
000008d0  32 5f 6c 61 6e 67 75 61  67 65 5f 63 6f 64 65 3a  |2_language_code:|
000008e0  20 20 70 6f 72 0a 20 20  20 20 20 20 20 20 20 20  |  por.          |
000008f0  20 20 6c 65 6e 67 74 68  5f 6f 66 5f 69 74 65 6d  |  length_of_item|
00000900  73 3a 20 31 37 32 20 28  30 78 61 63 29 0a 0a 20  |s: 172 (0xac).. |
00000910  20 20 20 20 20 20 20 20  20 20 20 20 20 20 69 74  |              it|
00000920  65 6d 5f 64 65 73 63 72  69 70 74 69 6f 6e 5f 6c  |em_description_l|
00000930  65 6e 67 74 68 3a 20 36  20 28 30 78 30 36 29 0a  |ength: 6 (0x06).|
00000940  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 69  |               i|
00000950  74 65 6d 5f 64 65 73 63  72 69 70 74 69 6f 6e 3a  |tem_description:|
00000960  20 22 41 63 74 6f 72 73  22 20 20 2d 2d 20 43 68  | "Actors"  -- Ch|
00000970  61 72 73 65 74 3a 20 4c  61 74 69 6e 20 61 6c 70  |arset: Latin alp|
00000980  68 61 62 65 74 0a 20 20  20 20 20 20 20 20 20 20  |habet.          |
00000990  20 20 20 20 20 69 74 65  6d 5f 6c 65 6e 67 74 68  |     item_length|
000009a0  3a 20 31 30 30 20 28 30  78 36 34 29 0a 20 20 20  |: 100 (0x64).   |
000009b0  20 20 20 20 20 20 20 20  20 20 20 20 69 74 65 6d  |            item|
000009c0  3a 20 22 4a 65 72 65 6d  79 20 49 72 6f 6e 73 2c  |: "Jeremy Irons,|
000009d0  20 4a 61 63 6b 20 48 75  73 74 6f 6e 2c 20 4d e9  | Jack Huston, M.|
000009e0  6c 61 6e 69 65 20 4c 61  75 72 65 6e 74 2c 2c 20  |lanie Laurent,, |
000009f0  4d 61 72 74 69 6e 61 20  47 65 64 65 63 6b 2c 20  |Martina Gedeck, |
00000a00  54 6f 6d 20 43 6f 75 72  74 65 6e 61 79 2c 20 41  |Tom Courtenay, A|
00000a10  75 67 75 73 74 20 44 69  65 68 6c 2c 20 42 72 75  |ugust Diehl, Bru|
00000a20  6e 6f 20 47 61 6e 7a 22  20 20 2d 2d 20 43 68 61  |no Ganz"  -- Cha|
00000a30  72 73 65 74 3a 20 4c 61  74 69 6e 20 61 6c 70 68  |rset: Latin alph|
00000a40  61 62 65 74 0a 0a 20 20  20 20 20 20 20 20 20 20  |abet..          |
00000a50  20 20 20 20 20 69 74 65  6d 5f 64 65 73 63 72 69  |     item_descri|
00000a60  70 74 69 6f 6e 5f 6c 65  6e 67 74 68 3a 20 38 20  |ption_length: 8 |
00000a70  28 30 78 30 38 29 0a 20  20 20 20 20 20 20 20 20  |(0x08).         |
00000a80  20 20 20 20 20 20 69 74  65 6d 5f 64 65 73 63 72  |      item_descr|
00000a90  69 70 74 69 6f 6e 3a 20  22 44 69 72 65 63 74 6f  |iption: "Directo|
00000aa0  72 22 20 20 2d 2d 20 43  68 61 72 73 65 74 3a 20  |r"  -- Charset: |
00000ab0  4c 61 74 69 6e 20 61 6c  70 68 61 62 65 74 0a 20  |Latin alphabet. |
00000ac0  20 20 20 20 20 20 20 20  20 20 20 20 20 20 69 74  |              it|
00000ad0  65 6d 5f 6c 65 6e 67 74  68 3a 20 31 32 20 28 30  |em_length: 12 (0|
00000ae0  78 30 63 29 0a 20 20 20  20 20 20 20 20 20 20 20  |x0c).           |
00000af0  20 20 20 20 69 74 65 6d  3a 20 22 42 69 6c 6c 65  |    item: "Bille|
00000b00  20 41 75 67 75 73 74 22  20 20 2d 2d 20 43 68 61  | August"  -- Cha|
00000b10  72 73 65 74 3a 20 4c 61  74 69 6e 20 61 6c 70 68  |rset: Latin alph|
00000b20  61 62 65 74 0a 0a 20 20  20 20 20 20 20 20 20 20  |abet..          |
00000b30  20 20 20 20 20 69 74 65  6d 5f 64 65 73 63 72 69  |     item_descri|
00000b40  70 74 69 6f 6e 5f 6c 65  6e 67 74 68 3a 20 34 20  |ption_length: 4 |
00000b50  28 30 78 30 34 29 0a 20  20 20 20 20 20 20 20 20  |(0x04).         |
00000b60  20 20 20 20 20 20 69 74  65 6d 5f 64 65 73 63 72  |      item_descr|
00000b70  69 70 74 69 6f 6e 3a 20  22 79 65 61 72 22 20 20  |iption: "year"  |
00000b80  2d 2d 20 43 68 61 72 73  65 74 3a 20 4c 61 74 69  |-- Charset: Lati|
00000b90  6e 20 61 6c 70 68 61 62  65 74 0a 20 20 20 20 20  |n alphabet.     |
00000ba0  20 20 20 20 20 20 20 20  20 20 69 74 65 6d 5f 6c  |          item_l|
00000bb0  65 6e 67 74 68 3a 20 34  20 28 30 78 30 34 29 0a  |ength: 4 (0x04).|
00000bc0  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 69  |               i|
00000bd0  74 65 6d 3a 20 22 32 30  31 33 22 20 20 2d 2d 20  |tem: "2013"  -- |
00000be0  43 68 61 72 73 65 74 3a  20 4c 61 74 69 6e 20 61  |Charset: Latin a|
00000bf0  6c 70 68 61 62 65 74 0a  0a 20 20 20 20 20 20 20  |lphabet..       |
00000c00  20 20 20 20 20 20 20 20  69 74 65 6d 5f 64 65 73  |        item_des|
00000c10  63 72 69 70 74 69 6f 6e  5f 6c 65 6e 67 74 68 3a  |cription_length:|
00000c20  20 37 20 28 30 78 30 37  29 0a 20 20 20 20 20 20  | 7 (0x07).      |
00000c30  20 20 20 20 20 20 20 20  20 69 74 65 6d 5f 64 65  |         item_de|
00000c40  73 63 72 69 70 74 69 6f  6e 3a 20 22 63 6f 75 6e  |scription: "coun|
00000c50  74 72 79 22 20 20 2d 2d  20 43 68 61 72 73 65 74  |try"  -- Charset|
00000c60  3a 20 4c 61 74 69 6e 20  61 6c 70 68 61 62 65 74  |: Latin alphabet|
00000c70  0a 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |.               |
00000c80  69 74 65 6d 5f 6c 65 6e  67 74 68 3a 20 32 33 20  |item_length: 23 |
00000c90  28 30 78 31 37 29 0a 20  20 20 20 20 20 20 20 20  |(0x17).         |
00000ca0  20 20 20 20 20 20 69 74  65 6d 3a 20 22 41 6c 65  |      item: "Ale|
00000cb0  6d 61 6e 68 61 2f 50 6f  72 74 75 67 61 6c 2f 53  |manha/Portugal/S|
00000cc0  75 69 e7 61 22 20 20 2d  2d 20 43 68 61 72 73 65  |ui.a"  -- Charse|
00000cd0  74 3a 20 4c 61 74 69 6e  20 61 6c 70 68 61 62 65  |t: Latin alphabe|
00000ce0  74 0a 0a 20 20 20 20 20  20 20 20 20 20 20 20 74  |t..            t|
Please let me know if you need more info/data.

Thanks

User avatar
dekarl
Developer
Posts: 228
Joined: Thu Feb 06, 2014 11:01 pm
Germany

Re: charset encoding problem with EIT

Post by dekarl » Wed Jun 24, 2015 1:16 pm

The hexdumps you attached do help, but are not the hexdumps I was looking for :)
dvbsnoop -n 2 -nph 0x12
Can you run that without "-nph"? (aka without "no hexdump")

It appears that they are signalling "Latin alphabet" (ISO/IEC 6937) but the actual data appears to be transmitted in one of the encodings from ISO/IEC 8859.
Code point E7 is ç, that leaves 8859 -1, -2, -3, -9, -14, -15, and -16.
Code point E9 appears to be é, but that does not remove possible encodings from the list of candidates.
But a Google search for "ISO 8859 Brazil" hints that it is likely -1 or -15 (with Euro sign). So we can add a fixup to force the encoding to assume Latin-9.

Can you verify which channels/transports do need this fixup? We can lock the fFixup to a combination of original_network_id plus transport_id (and if its only some channels one a transport then also service_id).
You can find an example for the mapping in eithelper.cpp in case you want to patch it in yourself for testing.

Code: Select all

    // Mark program on the HD transponders as HDTV
    fix[ 10012LL << 32 | 61441U << 16] = EITFixUp::kFixHDTV;
    fix[ 10013LL << 32 | 61441U << 16] = EITFixUp::kFixHDTV;

    // On transport 10004 only DMAX needs no fixing:
    fix[10004LL<<32 | 61441U << 16 | 50403] = // BBC World Service
    fix[10004LL<<32 | 61441U << 16 | 53101] = // BBC Prime (engl)
//...
    fix[10004LL<<32 | 61441U << 16 | 53618] = // K1010
    fix[10004LL<<32 | 61441U << 16 | 53619] = // GemsTV
        EITFixUp::kEFixForceISO8859_15;
The first two entries map transport_id 10012 and 10013 of the original_network_id 61441 to the HDTV fixup. The rest maps individual tv services by ONID,TSID and service_id to the 8859-15 fixup.

User avatar
stuartm
Developer
Posts: 129
Joined: Wed Feb 05, 2014 5:17 pm
Great Britain

Re: charset encoding problem with EIT

Post by stuartm » Wed Jun 24, 2015 1:17 pm

Obrigado pela informaçao, Renato

renatoriolino
Newcomer
Posts: 6
Joined: Thu Jun 18, 2015 5:06 pm
Brazil

Re: charset encoding problem with EIT

Post by renatoriolino » Wed Jun 24, 2015 3:47 pm

Hi,

Here is the output of dvbsnoop -n 2 0x12 (from the beginning until the event part):

Code: Select all

------------------------------------------------------------
SECT-Packet: 00000001   PID: 18 (0x0012), Length: 212 (0x00d4)
Time received: Wed 2015-06-24  12:29:36.978
------------------------------------------------------------
  0000:  50 f0 d1 00 96 f1 38 c8  00 33 00 01 38 50 8e 64   P.....8..3..8P.d
  0010:  df 6d 21 30 00 00 45 00  10 35 4d 14 70 6f 72 0f   .m!0..E..5M.por.
  0020:  4c 61 e7 6f 73 20 44 65  20 53 61 6e 67 75 65 00   La.os De Sangue.
  0030:  54 02 8c 00 4e 13 00 70  6f 72 00 0d 49 6e 64 69   T...N..por..Indi
  0040:  73 70 6f 6e ed 76 65 6c  2e 55 04 47 42 52 00 8e   spon.vel.U.GBR..
  0050:  65 df 6d 22 15 00 00 45  00 10 2f 4d 0e 70 6f 72   e.m"...E../M.por
  0060:  09 50 6f 64 65 72 6f 73  61 73 00 54 02 83 00 4e   .Poderosas.T...N
  0070:  13 00 70 6f 72 00 0d 49  6e 64 69 73 70 6f 6e ed   ..por..Indispon.
  0080:  76 65 6c 2e 55 04 47 42  52 09 8e 66 df 6d 23 00   vel.U.GBR..f.m#.
  0090:  00 01 00 00 10 3a 4d 19  70 6f 72 14 4a 6f 72 6e   .....:M.por.Jorn
  00a0:  61 6c 20 64 61 20 4d 65  69 61 2d 4e 6f 69 74 65   al da Meia-Noite
  00b0:  00 54 02 83 00 4e 13 00  70 6f 72 00 0d 49 6e 64   .T...N..por..Ind
  00c0:  69 73 70 6f 6e ed 76 65  6c 2e 55 04 47 42 52 00   ispon.vel.U.GBR.
  00d0:  56 ac 2c bb                                        V.,.

PID:  18 (0x0012)  [= assigned for: DVB Event Information Table (EIT)]

Guess table from table id...
EIT-decoding....
Table_ID: 80 (0x50)  [= Event Information Table (EIT) - actual transport stream, schedule]
section_syntax_indicator: 1 (0x01)
reserved_1: 1 (0x01)
reserved_2: 3 (0x03)
Section_length: 209 (0x00d1)
Service_ID: 150 (0x0096)  [=  --> refers to PMT program_number]
reserved_3: 3 (0x03)
Version_number: 24 (0x18)
current_next_indicator: 1 (0x01)  [= valid now]
Section_number: 56 (0x38)
Last_Section_number: 200 (0xc8)
Transport_stream_ID: 51 (0x0033)
Original_network_ID: 1 (0x0001)  [= Astra Satellite Network 19,2<B0>E | Soci<E9>t<E9> Europ<E9>enne des Satellites]
Segment_last_Section_number: 56 (0x38)
Last_table_id: 80 (0x50)  [= Event Information Table (EIT) - actual transport stream, schedule]

    Event_ID: 36452 (0x8e64)
    Start_time: 0xdf6d213000 [= 2015-06-24 21:30:00 (UTC)]
    Duration: 0x0004500 [=  00:45:00 (UTC)]
    Running_status: 0 (0x00)  [= undefined]
    Free_CA_mode: 1 (0x01)  [= streams [partially] CA controlled]
    Descriptors_loop_length: 53 (0x35)

            DVB-DescriptorTag: 77 (0x4d)  [= short_event_descriptor]
            descriptor_length: 20 (0x14)
              ISO639_2_language_code:  por
            event_name_length: 15 (0x0f)
            event_name: "La<E7>os De Sangue"  -- Charset: Latin alphabet
            text_length: 0 (0x00)
            text_char: ""
The correct charset for Brazil is ISO8859-1.

The charset is wrong on all channels, both on event name and event description (extended_event_descriptor). Below an example of description that is wrong too:

Code: Select all

------------------------------------------------------------
SECT-Packet: 00000002   PID: 18 (0x0012), Length: 236 (0x00ec)
Time received: Wed 2015-06-24  12:29:36.978
------------------------------------------------------------
  0000:  4e f0 e9 00 66 e5 01 01  00 02 00 01 01 4e 19 37   N...f........N.7
  0010:  df 6d 15 30 00 00 30 00  10 ce 4d 13 70 6f 72 0e   .m.0..0...M.por.
  0020:  44 69 6e 6f 20 41 76 65  6e 74 75 72 61 73 00 54   Dino Aventuras.T
  0030:  02 51 00 4e ad 00 70 6f  72 00 a7 41 20 62 6f 72   .Q.N..por..A bor
  0040:  64 6f 20 64 6f 20 62 61  72 63 6f 20 76 6f 61 64   do do barco voad
  0050:  6f 72 20 41 74 6c 61 73  2c 20 61 20 74 72 69 70   or Atlas, a trip
  0060:  75 6c 61 e7 e3 6f 20 66  6f 72 6d 61 64 61 20 70   ula..o formada p
  0070:  6f 72 20 44 69 6e 6f 2c  20 4b 69 6b 61 2c 20 4c   or Dino, Kika, L
  0080:  69 70 20 65 20 43 61 63  61 75 20 76 69 61 6a 61   ip e Cacau viaja
  0090:  20 61 63 69 6d 61 20 64  61 73 20 6e 75 76 65 6e    acima das nuven
  00a0:  73 20 65 6d 20 75 6d 20  75 6e 69 76 65 72 73 6f   s em um universo
  00b0:  20 6d e1 67 69 63 6f 20  63 6f 6d 70 6f 73 74 6f    m.gico composto
  00c0:  20 70 6f 72 20 66 61 6e  74 e1 73 74 69 63 61 73    por fant.sticas
  00d0:  20 69 6c 68 61 73 20 66  6c 75 74 75 61 6e 74 65    ilhas flutuante
  00e0:  73 2e 55 04 47 42 52 00  91 8d 45 3d               s.U.GBR...E=

PID:  18 (0x0012)  [= assigned for: DVB Event Information Table (EIT)]

Guess table from table id...
EIT-decoding....
Table_ID: 78 (0x4e)  [= Event Information Table (EIT) - actual transport stream, present/following]
section_syntax_indicator: 1 (0x01)
reserved_1: 1 (0x01)
reserved_2: 3 (0x03)
Section_length: 233 (0x00e9)
Service_ID: 102 (0x0066)  [=  --> refers to PMT program_number]
reserved_3: 3 (0x03)
Version_number: 18 (0x12)
current_next_indicator: 1 (0x01)  [= valid now]
Section_number: 1 (0x01)
Last_Section_number: 1 (0x01)
Transport_stream_ID: 2 (0x0002)
Original_network_ID: 1 (0x0001)  [= Astra Satellite Network 19,2<B0>E | Soci<E9>t<E9> Europ<E9>enne des Satellites]
Segment_last_Section_number: 1 (0x01)
Last_table_id: 78 (0x4e)  [= Event Information Table (EIT) - actual transport stream, present/following]

    Event_ID: 6455 (0x1937)
    Start_time: 0xdf6d153000 [= 2015-06-24 15:30:00 (UTC)]
    Duration: 0x0003000 [=  00:30:00 (UTC)]
    Running_status: 0 (0x00)  [= undefined]
    Free_CA_mode: 1 (0x01)  [= streams [partially] CA controlled]
    Descriptors_loop_length: 206 (0xce)

            DVB-DescriptorTag: 77 (0x4d)  [= short_event_descriptor]
            descriptor_length: 19 (0x13)
              ISO639_2_language_code:  por
            event_name_length: 14 (0x0e)
            event_name: "Dino Aventuras"  -- Charset: Latin alphabet
            text_length: 0 (0x00)
            text_char: ""

            DVB-DescriptorTag: 84 (0x54)  [= content_descriptor]
            descriptor_length: 2 (0x02)
               Content_nibble_level_1: 5 (0x05)
               Content_nibble_level_2: 1 (0x01)
                  [= pre-school children's program]
               User_nibble_1: 0 (0x00)
               User_nibble_2: 0 (0x00)


            DVB-DescriptorTag: 78 (0x4e)  [= extended_event_descriptor]
            descriptor_length: 173 (0xad)
            descriptor_number: 0 (0x00)
            last_descriptor_number: 0 (0x00)
            ISO639_2_language_code:  por
            length_of_items: 0 (0x00)

            text_length: 167 (0xa7)
            text: "A bordo do barco voador Atlas, a tripula<E7><E3>o formada por Dino, Kika, Lip e Cacau viaja acima das nuvens em um universo m<E1>gico composto por fant<E1>sticas ilhas flutuantes."  -- Charset: Latin alphabet

            DVB-DescriptorTag: 85 (0x55)  [= parental_rating_descriptor]
            descriptor_length: 4 (0x04)
               Country_code:  GBR
               Rating:  0 (0x00)  [= undefined]


CRC: 2441954621 (0x918d453d)
==========================================================
E7 = ç
E3 = ã
E1 = á


I know how to edit a .C file and compile mythtv, but I'm not sure what should I put on the .C file for testing.

Looking at the file I found lines like this:

fix[ 256U << 16] |= EITFixUp::kEFixForceISO8859_1;

But I'm not sure what to put in the place of 256U << 16. I see a comment that says: transport_id<<32 | network_id<<16 | service_id.

In my case network_id would be 1, right? One transport ID is 51. Is correct to add this to eithelper.cpp?

fix[ 51 << 32 | 1 << 16] |= EITFixUp::kEFixForceISO8859_1;

Thanks

User avatar
dekarl
Developer
Posts: 228
Joined: Thu Feb 06, 2014 11:01 pm
Germany

Re: charset encoding problem with EIT

Post by dekarl » Wed Jun 24, 2015 4:23 pm

renatoriolino wrote:In my case network_id would be 1, right? One transport ID is 51. Is correct to add this to eithelper.cpp?

fix[ 51 << 32 | 1 << 16] = EITFixUp::kEFixForceISO8859_1;
Yes, that is correct. There was only a small typo |= instead of just =
According to Lyngsat Astra with ONID 1 starts its TIDs around 1000, so this should fix your guide without breaking the guide of others :-)

Thanks for the hexdumps. I would like to get one last hexdump, showing an item list with the actors. (If there is other interesting data, a hexdump of that would be nice, too)

renatoriolino
Newcomer
Posts: 6
Joined: Thu Jun 18, 2015 5:06 pm
Brazil

Re: charset encoding problem with EIT

Post by renatoriolino » Wed Jun 24, 2015 4:36 pm

I'll try to change the cpp file and will post the results here after testing.

Here is another dvbsnoop output:

Code: Select all

------------------------------------------------------------
SECT-Packet: 00000001   PID: 18 (0x0012), Length: 453 (0x01c5)
Time received: Wed 2015-06-24  13:25:36.984
------------------------------------------------------------
  0000:  4e f1 c2 00 54 fd 00 01  00 34 00 01 01 4e 75 8f   N...T....4...Nu.
  0010:  df 6d 16 00 00 01 00 00  11 a7 4d 1b 70 6f 72 16   .m........M.por.
  0020:  43 61 73 74 6c 65 20 33  aa 20 54 65 6d 70 2e 20   Castle 3. Temp. 
  0030:  45 70 2e 20 32 30 00 54  02 70 00 4e fe 01 70 6f   Ep. 20.T.p.N..po
  0040:  72 c1 08 53 65 72 69 65  73 49 44 07 35 36 39 34   r..SeriesID.5694
  0050:  38 30 33 09 45 70 69 73  6f 64 65 49 44 02 32 30   803.EpisodeID.20
  0060:  06 41 63 74 6f 72 73 70  53 74 61 6e 61 20 4b 61   .ActorspStana Ka
  0070:  74 69 63 20 2c 20 4e 61  74 68 61 6e 20 46 69 6c   tic , Nathan Fil
  0080:  6c 69 6f 6e 20 2c 20 53  75 73 61 6e 20 53 75 6c   lion , Susan Sul
  0090:  6c 69 76 61 6e 2c 20 52  75 62 65 6e 20 53 61 6e   livan, Ruben San
  00a0:  74 69 61 67 6f 2d 48 75  64 73 6f 6e 20 2c 20 53   tiago-Hudson , S
  00b0:  65 61 6d 75 73 20 44 65  76 65 72 20 2c 20 54 61   eamus Dever , Ta
  00c0:  6d 61 6c 61 20 4a 6f 6e  65 73 2c 20 47 72 65 67   mala Jones, Greg
  00d0:  67 20 44 61 6e 69 65 6c  08 44 69 72 65 63 74 6f   g Daniel.Directo
  00e0:  72 0b 53 74 65 76 65 20  42 6f 79 75 6d 04 79 65   r.Steve Boyum.ye
  00f0:  61 72 04 32 30 31 31 07  63 6f 75 6e 74 72 79 03   ar.2011.country.
  0100:  45 55 41 37 43 61 73 74  6c 65 20 65 20 42 65 63   EUA7Castle e Bec
  0110:  6b 65 74 74 20 69 6e 76  65 73 74 69 67 61 6d 20   kett investigam 
  0120:  6f 20 63 61 73 6f 20 64  65 20 75 6d 20 68 6f 6d   o caso de um hom
  0130:  65 6d 20 65 6e 63 6f 6e  74 72 61 4e 7e 11 70 6f   em encontraN~.po
  0140:  72 00 78 64 6f 20 6d 6f  72 74 6f 20 64 65 6e 74   r.xdo morto dent
  0150:  72 6f 20 64 65 20 75 6d  20 66 6f 72 6e 6f 20 64   ro de um forno d
  0160:  65 20 70 69 7a 7a 61 2c  20 6d 61 73 20 61 63 61   e pizza, mas aca
  0170:  62 6f 75 20 72 65 76 65  6c 61 6e 64 6f 20 6e e3   bou revelando n.
  0180:  6f 20 73 65 72 20 61 6c  67 75 e9 6d 20 61 73 73   o ser algu.m ass
  0190:  6f 63 69 61 64 6f 20 63  6f 6d 20 6f 20 72 65 73   ociado com o res
  01a0:  74 61 75 72 61 6e 74 65  20 2d 20 65 72 61 20 75   taurante - era u
  01b0:  6d 20 72 65 70 f3 72 74  65 72 2e 55 04 47 42 52   m rep.rter.U.GBR
  01c0:  0b a7 1d 3f 5f                                     ...?_

PID:  18 (0x0012)  [= assigned for: DVB Event Information Table (EIT)]

Guess table from table id...
EIT-decoding....
Table_ID: 78 (0x4e)  [= Event Information Table (EIT) - actual transport stream, present/following]
section_syntax_indicator: 1 (0x01)
reserved_1: 1 (0x01)
reserved_2: 3 (0x03)
Section_length: 450 (0x01c2)
Service_ID: 84 (0x0054)  [=  --> refers to PMT program_number]
reserved_3: 3 (0x03)
Version_number: 30 (0x1e)
current_next_indicator: 1 (0x01)  [= valid now]
Section_number: 0 (0x00)
Last_Section_number: 1 (0x01)
Transport_stream_ID: 52 (0x0034)
Original_network_ID: 1 (0x0001)  [= Astra Satellite Network 19,2<B0>E | Soci<E9>t<E9> Europ<E9>enne des Satellites]
Segment_last_Section_number: 1 (0x01)
Last_table_id: 78 (0x4e)  [= Event Information Table (EIT) - actual transport stream, present/following]

    Event_ID: 30095 (0x758f)
    Start_time: 0xdf6d160000 [= 2015-06-24 16:00:00 (UTC)]
    Duration: 0x0010000 [=  01:00:00 (UTC)]
    Running_status: 0 (0x00)  [= undefined]
    Free_CA_mode: 1 (0x01)  [= streams [partially] CA controlled]
    Descriptors_loop_length: 423 (0x1a7)

            DVB-DescriptorTag: 77 (0x4d)  [= short_event_descriptor]
            descriptor_length: 27 (0x1b)
              ISO639_2_language_code:  por
            event_name_length: 22 (0x16)
            event_name: "Castle 3<AA> Temp. Ep. 20"  -- Charset: Latin alphabet
            text_length: 0 (0x00)
            text_char: ""

            DVB-DescriptorTag: 84 (0x54)  [= content_descriptor]
            descriptor_length: 2 (0x02)
               Content_nibble_level_1: 7 (0x07)
               Content_nibble_level_2: 0 (0x00)
                  [= arts/culture (without music, general)]
               User_nibble_1: 0 (0x00)
               User_nibble_2: 0 (0x00)


            DVB-DescriptorTag: 78 (0x4e)  [= extended_event_descriptor]
            descriptor_length: 254 (0xfe)
            descriptor_number: 0 (0x00)
            last_descriptor_number: 1 (0x01)
            ISO639_2_language_code:  por
            length_of_items: 193 (0xc1)

               item_description_length: 8 (0x08)
               item_description: "SeriesID"  -- Charset: Latin alphabet
               item_length: 7 (0x07)
               item: "5694803"  -- Charset: Latin alphabet

               item_description_length: 9 (0x09)
               item_description: "EpisodeID"  -- Charset: Latin alphabet
               item_length: 2 (0x02)
               item: "20"  -- Charset: Latin alphabet

               item_description_length: 6 (0x06)
               item_description: "Actors"  -- Charset: Latin alphabet
               item_length: 112 (0x70)
               item: "Stana Katic , Nathan Fillion , Susan Sullivan, Ruben Santiago-Hudson , Seamus Dever , Tamala Jones, Gregg Daniel"  -- Charset: Latin alphabet

               item_description_length: 8 (0x08)
               item_description: "Director"  -- Charset: Latin alphabet
               item_length: 11 (0x0b)
               item: "Steve Boyum"  -- Charset: Latin alphabet

               item_description_length: 4 (0x04)
               item_description: "year"  -- Charset: Latin alphabet
               item_length: 4 (0x04)
               item: "2011"  -- Charset: Latin alphabet

               item_description_length: 7 (0x07)
               item_description: "country"  -- Charset: Latin alphabet
               item_length: 3 (0x03)
               item: "EUA"  -- Charset: Latin alphabet

            text_length: 55 (0x37)
            text: "Castle e Beckett investigam o caso de um homem encontra"  -- Charset: Latin alphabet

            DVB-DescriptorTag: 78 (0x4e)  [= extended_event_descriptor]
            descriptor_length: 126 (0x7e)
            descriptor_number: 1 (0x01)
            last_descriptor_number: 1 (0x01)
            ISO639_2_language_code:  por
            length_of_items: 0 (0x00)

            text_length: 120 (0x78)
            text: "do morto dentro de um forno de pizza, mas acabou revelando n<E3>o ser algu<E9>m associado com o restaurante - era um rep<F3>rter."  -- Charset: Latin alphabet

            DVB-DescriptorTag: 85 (0x55)  [= parental_rating_descriptor]
            descriptor_length: 4 (0x04)
               Country_code:  GBR
               Rating:  11 (0x0b)  [= minimum age: 14 years]


CRC: 2803711839 (0xa71d3f5f)
==========================================================
and another one:

Code: Select all

------------------------------------------------------------
SECT-Packet: 00000003   PID: 18 (0x0012), Length: 443 (0x01bb)
Time received: Wed 2015-06-24  13:25:36.996
------------------------------------------------------------
  0000:  50 f1 b8 00 3f fd 10 c8  00 01 00 01 10 50 60 30   P...?........P`0
  0010:  df 6d 07 15 00 02 05 00  11 9d 4d 13 70 6f 72 0e   .m........M.por.
  0020:  4f 20 50 65 73 6f 20 64  61 20 c1 67 75 61 00 54   O Peso da .gua.T
  0030:  02 4d 00 4e fe 01 70 6f  72 e1 06 41 63 74 6f 72   .M.N..por..Actor
  0040:  73 9c 43 61 74 68 65 72  69 6e 65 20 4d 63 43 6f   s.Catherine McCo
  0050:  72 6d 61 63 6b 2c 20 53  61 72 61 68 20 50 6f 6c   rmack, Sarah Pol
  0060:  6c 65 79 2c 20 53 65 61  6e 20 50 65 6e 6e 2c 20   ley, Sean Penn, 
  0070:  4a 6f 73 68 20 4c 75 63  61 73 2c 20 45 6c 69 7a   Josh Lucas, Eliz
  0080:  61 62 65 74 68 20 48 75  72 6c 65 79 2c 20 43 69   abeth Hurley, Ci
  0090:  61 72 e1 6e 20 48 69 6e  64 73 2c 20 55 6c 72 69   ar.n Hinds, Ulri
  00a0:  63 68 20 54 68 6f 6d 73  65 6e 2c 20 41 6e 64 65   ch Thomsen, Ande
  00b0:  72 73 20 57 2e 20 42 65  72 74 68 65 6c 73 65 6e   rs W. Berthelsen
  00c0:  2c 20 4a 6f 68 6e 20 4d  61 63 6c 61 72 65 6e 2c   , John Maclaren,
  00d0:  20 4a 6f 73 65 70 68 20  52 75 74 74 65 6e 08 44    Joseph Rutten.D
  00e0:  69 72 65 63 74 6f 72 0f  4b 61 74 68 72 79 6e 20   irector.Kathryn 
  00f0:  42 69 67 65 6c 6f 77 04  79 65 61 72 04 32 30 30   Bigelow.year.200
  0100:  30 07 63 6f 75 6e 74 72  79 11 45 55 41 2f 46 72   0.country.EUA/Fr
  0110:  61 6e e7 61 2f 43 61 6e  61 64 e1 17 44 75 72 61   an.a/Canad..Dura
  0120:  6e 74 65 20 76 69 61 67  65 6d 20 63 6f 6d 20 66   nte viagem com f
  0130:  69 6e 73 4e 7c 11 70 6f  72 00 76 20 64 65 20 64   insN|.por.v de d
  0140:  65 73 76 65 6e 64 61 72  20 75 6d 20 61 6e 74 69   esvendar um anti
  0150:  67 6f 20 63 72 69 6d 65  2c 20 64 6f 69 73 20 63   go crime, dois c
  0160:  61 73 61 69 73 20 64 65  73 63 6f 62 72 65 6d 20   asais descobrem 
  0170:  6d 75 69 74 6f 20 6d 61  69 73 20 73 6f 62 72 65   muito mais sobre
  0180:  20 65 6c 65 73 20 64 6f  20 71 75 65 20 73 6f 62    eles do que sob
  0190:  72 65 20 6f 20 61 73 73  75 6e 74 6f 20 71 75 65   re o assunto que
  01a0:  20 6f 73 20 6c 65 76 6f  75 20 61 74 e9 20 6c e1    os levou at. l.
  01b0:  2e 55 04 47 42 52 0d 06  67 fe c2                  .U.GBR..g..

PID:  18 (0x0012)  [= assigned for: DVB Event Information Table (EIT)]

Guess table from table id...
EIT-decoding....
Table_ID: 80 (0x50)  [= Event Information Table (EIT) - actual transport stream, schedule]
section_syntax_indicator: 1 (0x01)
reserved_1: 1 (0x01)
reserved_2: 3 (0x03)
Section_length: 440 (0x01b8)
Service_ID: 63 (0x003f)  [=  --> refers to PMT program_number]
reserved_3: 3 (0x03)
Version_number: 30 (0x1e)
current_next_indicator: 1 (0x01)  [= valid now]
Section_number: 16 (0x10)
Last_Section_number: 200 (0xc8)
Transport_stream_ID: 1 (0x0001)
Original_network_ID: 1 (0x0001)  [= Astra Satellite Network 19,2<B0>E | Soci<E9>t<E9> Europ<E9>enne des Satellites]
Segment_last_Section_number: 16 (0x10)
Last_table_id: 80 (0x50)  [= Event Information Table (EIT) - actual transport stream, schedule]

    Event_ID: 24624 (0x6030)
    Start_time: 0xdf6d071500 [= 2015-06-24 07:15:00 (UTC)]
    Duration: 0x0020500 [=  02:05:00 (UTC)]
    Running_status: 0 (0x00)  [= undefined]
    Free_CA_mode: 1 (0x01)  [= streams [partially] CA controlled]
    Descriptors_loop_length: 413 (0x19d)

            DVB-DescriptorTag: 77 (0x4d)  [= short_event_descriptor]
            descriptor_length: 19 (0x13)
              ISO639_2_language_code:  por
            event_name_length: 14 (0x0e)
            event_name: "O Peso da <C1>gua"  -- Charset: Latin alphabet
            text_length: 0 (0x00)
            text_char: ""

            DVB-DescriptorTag: 84 (0x54)  [= content_descriptor]
            descriptor_length: 2 (0x02)
               Content_nibble_level_1: 4 (0x04)
               Content_nibble_level_2: 13 (0x0d)
                  [= reserved]
               User_nibble_1: 0 (0x00)
               User_nibble_2: 0 (0x00)


            DVB-DescriptorTag: 78 (0x4e)  [= extended_event_descriptor]
            descriptor_length: 254 (0xfe)
            descriptor_number: 0 (0x00)
            last_descriptor_number: 1 (0x01)
            ISO639_2_language_code:  por
            length_of_items: 225 (0xe1)

               item_description_length: 6 (0x06)
               item_description: "Actors"  -- Charset: Latin alphabet
               item_length: 156 (0x9c)
               item: "Catherine McCormack, Sarah Polley, Sean Penn, Josh Lucas, Elizabeth Hurley, Ciar<E1>n Hinds, Ulrich Thomsen, Anders W. Berthelsen, John Maclaren, Joseph Rutten"  -- Charset: Latin alphabet

               item_description_length: 8 (0x08)
               item_description: "Director"  -- Charset: Latin alphabet
               item_length: 15 (0x0f)
               item: "Kathryn Bigelow"  -- Charset: Latin alphabet

               item_description_length: 4 (0x04)
               item_description: "year"  -- Charset: Latin alphabet
               item_length: 4 (0x04)
               item: "2000"  -- Charset: Latin alphabet

               item_description_length: 7 (0x07)
               item_description: "country"  -- Charset: Latin alphabet
               item_length: 17 (0x11)
               item: "EUA/Fran<E7>a/Canad<E1>"  -- Charset: Latin alphabet

            text_length: 23 (0x17)
            text: "Durante viagem com fins"  -- Charset: Latin alphabet

            DVB-DescriptorTag: 78 (0x4e)  [= extended_event_descriptor]
            descriptor_length: 124 (0x7c)
            descriptor_number: 1 (0x01)
            last_descriptor_number: 1 (0x01)
            ISO639_2_language_code:  por
            length_of_items: 0 (0x00)

            text_length: 118 (0x76)
            text: " de desvendar um antigo crime, dois casais descobrem muito mais sobre eles do que sobre o assunto que os levou at<E9> l<E1>."  -- Charset: Latin alphabet

            DVB-DescriptorTag: 85 (0x55)  [= parental_rating_descriptor]
            descriptor_length: 4 (0x04)
               Country_code:  GBR
               Rating:  13 (0x0d)  [= minimum age: 16 years]


CRC: 107478722 (0x0667fec2)
==========================================================
Let me know if you need more.

Thanks

renatoriolino
Newcomer
Posts: 6
Joined: Thu Jun 18, 2015 5:06 pm
Brazil

Re: charset encoding problem with EIT

Post by renatoriolino » Wed Jun 24, 2015 6:44 pm

Looks like it didn't worked.

I have added the following on the file mythtv/libs/libmythtv/eithelper.cpp at the beginning of the init_fixup function:

Code: Select all

    // DVB-S StarOne C2 70W
    fix[ 1 << 32 | 1 << 16 ] = EITFixUp::kEFixForceISO8859_1;
    fix[ 2 << 32 | 1 << 16 ] = EITFixUp::kEFixForceISO8859_1;
    fix[ 3 << 32 | 1 << 16 ] = EITFixUp::kEFixForceISO8859_1;
    fix[ 4 << 32 | 1 << 16 ] = EITFixUp::kEFixForceISO8859_1;
    fix[ 50 << 32 | 1 << 16 ] = EITFixUp::kEFixForceISO8859_1;
    fix[ 51 << 32 | 1 << 16 ] = EITFixUp::kEFixForceISO8859_1;
    fix[ 52 << 32 | 1 << 16 ] = EITFixUp::kEFixForceISO8859_1;
    fix[ 53 << 32 | 1 << 16 ] = EITFixUp::kEFixForceISO8859_1;
    fix[ 54 << 32 | 1 << 16 ] = EITFixUp::kEFixForceISO8859_1;
    fix[ 55 << 32 | 1 << 16 ] = EITFixUp::kEFixForceISO8859_1;
    fix[ 56 << 32 | 1 << 16 ] = EITFixUp::kEFixForceISO8859_1;
    fix[ 57 << 32 | 1 << 16 ] = EITFixUp::kEFixForceISO8859_1;
    fix[ 58 << 32 | 1 << 16 ] = EITFixUp::kEFixForceISO8859_1;
    fix[ 59 << 32 | 1 << 16 ] = EITFixUp::kEFixForceISO8859_1;
After recompile, I run mythtv-setup again and made a new channel scan. All channels that have acents on their name are still with the wron charset. EPG data is still wrong too.

Did I made some mistake? Any log that I shoul check?

Thanks

renatoriolino
Newcomer
Posts: 6
Joined: Thu Jun 18, 2015 5:06 pm
Brazil

Re: charset encoding problem with EIT

Post by renatoriolino » Thu Jun 25, 2015 7:37 pm

Ok, it is actually working for EPG data. I see now that the new data that was grabbed is now with the correct charset. But channels name are still incorrect.

dekarl, is it possible to force ISO8859-1 for channels name too?

Thanks

Post Reply