15 March, 2018

EXIF Tagging and international caracter sets

I have just scanned approximately 2000 slides from my parents and I decided to tag them, to make it easier to search among them. The goal is to store the tags inside the jpg files, and to be standardized, not dependent on any specific software. The first idea is to use the standard EXIF fields for this purpose.

This post is not finished, but already may contain valuable information. Please use it at your own risk.


The type of tags I am planning to use, is the persons on the picture, the location of the picture taken, this is some cases is very precise, but in some cases only the country or the region is known, the date when the picture was taken, here also in some cases I have the exact date, but in some cases I am just guessing the date and finally I plan to add some kind of "theme", for example I have plenty of photos shoot in zoos, beaches and about nice flowers.

IPTC standard is available here.

After some searches and experiments it quickly turned out, that EXIF had limited tagging support, but jpg files have IPTC and XMP extensions as well, which are used for tagging and it seems that there is no common agreement which one is used, so a lot of software is using booth of them.

Originally I planned manual people tagging, but after trying out Picasa tagging, I decided to use it's automatic tagging, because beside the automatic tagging it shows where the faces are on the picture. For the location GEOTagging is a good way, but I decided to use GEOTagging only when I know exactly the location, when I know only the country or region I will use normal tags. The reason for his is that it is misleading if the GEOTags point to an exact location and not for the country the original location expression is not retained (át least not in Picasa).

I am using both Linux and Windows computers at home and I was very happy to find this blog about EXIF tagging on Linux: https://beckustech.wordpress.com/2013/03/12/tagging-jpeg-image-files/. After trying it out,I come across the problem that the character encoding of the tags is not trivial, neither their handling in different programs, so I decided to test various programs and see what tags they are setting.

I have tested the following Windows programs:

  • Windows Explorer
  • iTag
  • XnViewMP
  • Picasa
During the test I took a "virgin" picture, that came out of the scanner, and added as many fields as it was feasible with the given program. XnViewMP is very complex, there I did not try out all the IPTC and XMP fields possible. After tagging, I looked at the files on Linux with exiv2 and exiftool, to see the added tags, then tried to see how they were visible in other programs.

Plain file

The original file had the following attributes (as displayed with exiv2):

Exif.Image.Software                          Ascii      24  ArcSoft MediaImpression
Exif.Image.DateTime                          Ascii      20  2015:12:11 09:08:27
Exif.Image.ExifTag                           Long        1  94
Exif.Photo.ExifVersion                       Undefined   6  (48 50 50 48 0 0)
Exif.Photo.PixelXDimension                   Long        1  5040
Exif.Photo.PixelYDimension                   Long        1  3360
Exif.Thumbnail.Compression                   Short       1  JPEG (old-style)
Exif.Thumbnail.XResolution                   Rational    1  72
Exif.Thumbnail.YResolution                   Rational    1  72
Exif.Thumbnail.ResolutionUnit                Short       1  inch
Exif.Thumbnail.JPEGInterchangeFormat         Long        1  236
Exif.Thumbnail.JPEGInterchangeFormatLength   Long        1  8304

exiftool basically displays the same information.

Windows Explorer (Windows 10)

In Windows Explorer I had the following fields in the Properties/Details (I use Hungarian Windows, so the names are translated back to English)

  • Title (Cím)
  • Subject (Tárgy)
  • Rating (Minősítés)
  • Tags (Címkék)
  • Comment (Megjegyzés)

After adding all these fields, I got the following tags with exiv2:

Exif.Image.ImageDescription                  Ascii      35  WE cím: árvíztűrő ütvefúró
Exif.Image.Software                          Ascii      24  ArcSoft MediaImpression
Exif.Image.DateTime                          Ascii      20  2015:12:11 09:08:27
Exif.Image.Rating                            Short       1  4
Exif.Image.RatingPercent                     Short       1  75
Exif.Image.ExifTag                           Long        1  2286
Exif.Photo.ExifVersion                       Undefined   6  (48 50 50 48 0 0)
Exif.Photo.PixelXDimension                   Long        1  5040
Exif.Photo.PixelYDimension                   Long        1  3360
Exif.Image.XPTitle                           Byte       54  WE cím: árvíztűrő ütvefúró
Exif.Image.XPComment                         Byte       68  WE megjegyzés: árvíztűrő ütvefúró
Exif.Image.XPKeywords                        Byte      120  WE cimke: árvíztűrő ütvefúró1;WE címke: árvíztűrő ütvefúró2
Exif.Image.XPSubject                         Byte       58  WE tárgy: árvíztűrő ütvefúró
Exif.Thumbnail.Compression                   Short       1  JPEG (old-style)
Exif.Thumbnail.XResolution                   Rational    1  72
Exif.Thumbnail.YResolution                   Rational    1  72
Exif.Thumbnail.ResolutionUnit                Short       1  inch
Exif.Thumbnail.JPEGInterchangeFormat         Long        1  4800
Exif.Thumbnail.JPEGInterchangeFormatLength   Long        1  4970
Xmp.dc.title                                 LangAlt     1  lang="x-default" WE cím: árvíztűrő ütvefúró
Xmp.dc.description                           LangAlt     1  lang="x-default" WE cím: árvíztűrő ütvefúró
Xmp.dc.subject                               XmpBag      2  WE cimke: árvíztűrő ütvefúró1, WE címke: árvíztűrő ütvefúró2
Xmp.xmp.Rating                               XmpText     1  4
Xmp.MicrosoftPhoto.Rating                    XmpText     2  75
Xmp.MicrosoftPhoto.LastKeywordXMP            XmpBag      2  WE cimke: árvíztűrő ütvefúró1, WE címke: árvíztűrő ütvefúró2
Xmp.xmpMM.InstanceID                         XmpText    41  uuid:faf5bdd5-ba3d-11da-ad31-d33d75182f1b


What we can see, that Title was replicated in 4 tags: Exif.Image.ImageDescription, Exif.Image.XPTitle, Xmp.dc.description, Xmp.dec.subject

Subject was put to Exif.Image.XPSubject

Rating was put to Exif.Image.Rating, Exif.Image.RatingPercent, Xmp.xmp.Rating to Xmp.MicrosoftPhoto.Rating

Tags were put to Exif.Image.XPKeywords (semicolon separated list), Xmp.dc.subject (coma separated list) Xmp.MicrosoftPhoto.LastKeywordXMP (coma separated list)

Comment were put to Exif.Image.XPComment

New field appeared: Xmp.xmpMM.InstaneID, this seems to be some kind of unique identifier.

All fields are Unicode encoded, and display correctly under Linux command line.

iTag

With iTag the following fields can be added:
  • Title
  • Description
  • Tags
  • Rating (0-5 stars)
  • Geolocation
After adding the fields, we got the following:

Exif.Image.ImageDescription                  Ascii      44  iTag Description: árvíztűrő ütvefúró
Exif.Image.Software                          Ascii      24  ArcSoft MediaImpression
Exif.Image.DateTime                          Ascii      20  2015:12:11 09:08:27
Exif.Image.ExifTag                           Long        1  94
Exif.Photo.ExifVersion                       Undefined   6  (48 50 50 48 0 0)
Exif.Photo.PixelXDimension                   Long        1  5040
Exif.Photo.PixelYDimension                   Long        1  3360
Exif.Image.GPSTag                            Long        1  8704
Exif.GPSInfo.GPSVersionID                    Byte        4  2.0.0.0
Exif.GPSInfo.GPSLatitudeRef                  Ascii       2  North
Exif.GPSInfo.GPSLatitude                     Rational    3  21deg 31.30542'
Exif.GPSInfo.GPSLongitudeRef                 Ascii       2  West
Exif.GPSInfo.GPSLongitude                    Rational    3  77deg 46.87001'
Exif.Thumbnail.Compression                   Short       1  JPEG (old-style)
Exif.Thumbnail.XResolution                   Rational    1  72
Exif.Thumbnail.YResolution                   Rational    1  72
Exif.Thumbnail.ResolutionUnit                Short       1  inch
Exif.Thumbnail.JPEGInterchangeFormat         Long        1  236
Exif.Thumbnail.JPEGInterchangeFormatLength   Long        1  8304
Iptc.Application2.RecordVersion              Short       1  2
Iptc.Application2.ObjectName                 String     30  iTag Title: ▒rv▒zt▒r▒ ▒tvef▒r▒
Iptc.Application2.Keywords                   String     32  iTag c▒mke: ▒rv▒zt▒r▒ ▒tvef▒r▒ 1
Iptc.Application2.Keywords                   String     32  iTag cimke: ▒rv▒zt▒r▒ ▒tvef▒r▒ 2
Iptc.Application2.Headline                   String     30  iTag Title: ▒rv▒zt▒r▒ ▒tvef▒r▒
Iptc.Application2.Caption                    String     36  iTag Description: ▒rv▒zt▒r▒ ▒tvef▒r▒
Xmp.xmp.Rating                               XmpText     1  4
Xmp.xmp.ModifyDate                           XmpText    25  2015-12-11T09:08:27+01:00
Xmp.xmp.CreatorTool                          XmpText    23  ArcSoft MediaImpression
Xmp.exif.PixelYDimension                     XmpText     4  3360
Xmp.exif.PixelXDimension                     XmpText     4  5040
Xmp.exif.NativeDigest                        XmpText   414  36864,40960,40961,37121,37122,40962,40963,37510,40964,36867,
36868,33434,33437,34850,34852,34855,34856,37377,37378,37379,37380,37381,
37382,37383,37384,37385,37386,37396,41483,41484,41486,41487,41488,41492,
41493,41495,41728,41729,41730,41985,41986,41987,41988,41989,41990,41991,
41992,41993,41994,41995,41996,42016,0,2,4,5,6,7,8,9,10,11,12,13,14,15,16,
17,18,20,22,23,24,25,26,27,28,30;223C07D4517837F3184483D4250C4089
Xmp.exif.GPSVersionID                        XmpText     7  2.0.0.0
Xmp.exif.GPSLatitude                         XmpText    13  21,31.305418N
Xmp.exif.GPSLongitude                        XmpText    13  77,46.870012W
Xmp.exif.UserComment                         LangAlt     1  lang="x-default" iTag Description: árvíztűrő ütvefúró
Xmp.tiff.NativeDigest                        XmpText   134  256,257,258,259,262,274,277,284,530,531,282,283,296,301,318,319,529,532,306
,270,271,272,305,315,33432;3D85F00ECD3FEC961811F936FADCE274
Xmp.MicrosoftPhoto.Rating                    XmpText     2  75
Xmp.dc.title                                 LangAlt     1  lang="x-default" iTag Title: árvíztűrő ütvefúró
Xmp.dc.description                           LangAlt     1  lang="x-default" iTag Description: árvíztűrő ütvefúró
Xmp.dc.subject                               XmpBag      2  iTag címke: árvíztűrő ütvefúró 1, iTag cimke: árvíztűrő ütvefúró 2

First, what we can see, that iTag is adding IPTC fields and the fields are added using Windows-1250 encoding.

iTag is using the following encoding:

  • Title: xmp.dc.title, Iptc.Application2.ObjectName, Iptc.Application2.Headline
  • Description: Exif.Image.Description, Xmp.dc.description, Xmp.exif.UserComment, Iptc.Application2.Caption
  • Tags: Xmp.dc.subject, Iptc.Application2.Keywords
  • Rating: Xmp.xmp.Rating, Xmp.MicrosoftPhoto.Rating
  • GEOLocation: Xmp.exif.GPS*, Exif.GPSInfo.*
XnViewMP

In XnView I did the following settings:
  • Set comment
  • Caption
  • Headline
  • Stars
  • Tags
Exif.Image.Software                          Ascii      24  ArcSoft MediaImpression
Exif.Image.DateTime                          Ascii      20  2015:12:11 09:08:27
Exif.Image.ExifTag                           Long        1  94
Exif.Photo.ExifVersion                       Undefined   6  (48 50 50 48 0 0)
Exif.Photo.PixelXDimension                   Long        1  5040
Exif.Photo.PixelYDimension                   Long        1  3360
Exif.Thumbnail.Compression                   Short       1  JPEG (old-style)
Exif.Thumbnail.XResolution                   Rational    1  72
Exif.Thumbnail.YResolution                   Rational    1  72
Exif.Thumbnail.ResolutionUnit                Short       1  inch
Exif.Thumbnail.JPEGInterchangeFormat         Long        1  236
Exif.Thumbnail.JPEGInterchangeFormatLength   Long        1  8304
Iptc.Envelope.CharacterSet                   String      3
Iptc.Application2.Keywords                   String     46  XnViewMP Category: árvíztűrő ütvefúró 1
Iptc.Application2.Keywords                   String     46  XnViewMP Category: árvíztűrő ütvefúró 2
Iptc.Application2.Headline                   String     45  XNView MP Headline: árvíztűrő ütvefúró
Iptc.Application2.Caption                    String     44  XNView MP Caption: árvíztűrő ütvefúró
Xmp.dc.description                           LangAlt     1  lang="x-default" XNView MP Caption: árvíztűrő ütvefúró
Xmp.dc.subject                               XmpBag      2  XnViewMP Category: árvíztűrő ütvefúró 1, XnViewMP Category: árvíztűrő ütvefúró 2
Xmp.photoshop.Headline                       XmpText    45  XNView MP Headline: árvíztűrő ütvefúró
Xmp.lr.hierarchicalSubject                   XmpBag      2  XnViewMP Category: árvíztűrő ütvefúró 1, XnViewMP Category: árvíztűrő ütvefúró 2
Xmp.xmp.Rating                               XmpText     1  3
Xmp.MicrosoftPhoto.Rating                    XmpText     2  50

XnView is also using Iptc tags, but it is setting the Iptc.Envelope.CharacterSet and it is using Unicode encoding.

XnViewMP is storing the comment in the file, but exiv2 is not displaying it. The rating has to be enabled on the setting to be exported to the xmp metadata.

XnView has better support for IPTC tags, because they can be used on the browser page to be displayed as captions for the thumbnails, this is not possible for XMP tags. Also it is possible to display some EXIF information below the thumbnails, but in this case the character set of the text is not handled correctly (in the EXIF information box it is handled correctly).

Picasa

Picasa on one hand is weak on taging but can do automatic face detection, with exporting it to the file and it has also a good built in GEOTaging support.

The following settings are supported:
  • Subtitle (Képaláírás)
  • People (Emberek)
  • Location (Helyek)
  • Tags (Címkék)
  • Star

Exif.Image.Software                          Ascii      24  ArcSoft MediaImpression
Exif.Image.DateTime                          Ascii      20  2016:01:03 13:57:24
Exif.Image.ExifTag                           Long        1  106
Exif.Photo.ExifVersion                       Undefined   6  (48 50 50 48 0 0)
Exif.Photo.PixelXDimension                   Long        1  5040
Exif.Photo.PixelYDimension                   Long        1  3360
Exif.Photo.ImageUniqueID                     Ascii      33  ad72f569043bd47c5ba93b8f04979b1e
Exif.Image.GPSTag                            Long        1  200
Exif.GPSInfo.GPSVersionID                    Byte        4  2.2.0.0
Exif.GPSInfo.GPSLatitudeRef                  Ascii       2  North
Exif.GPSInfo.GPSLatitude                     Rational    3  21deg 31' 18.326"
Exif.GPSInfo.GPSLongitudeRef                 Ascii       2  West
Exif.GPSInfo.GPSLongitude                    Rational    3  77deg 46' 52.198"
Exif.GPSInfo.GPSAltitudeRef                  Byte        1  Above sea level
Exif.Thumbnail.Compression                   Short       1  JPEG (old-style)
Exif.Thumbnail.XResolution                   Rational    1  72
Exif.Thumbnail.YResolution                   Rational    1  72
Exif.Thumbnail.ResolutionUnit                Short       1  inch
Exif.Thumbnail.JPEGInterchangeFormat         Long        1  420
Exif.Thumbnail.JPEGInterchangeFormatLength   Long        1  8304
Iptc.Envelope.ModelVersion                   Short       1  4
Iptc.Envelope.CharacterSet                   String      3
Iptc.Application2.RecordVersion              Short       1  4
Iptc.Application2.Keywords                   String     42  Picassa cimke: árvíztűrő ütvefúró 1
Iptc.Application2.Keywords                   String     42  Picassa cimke: árvíztűrő ütvefúró 2
Iptc.Application2.Caption                    String     49  Picassa képaláírás: árvíztűrő ütvefúró
Xmp.xmp.ModifyDate                           XmpText    25  2016-01-03T13:57:24+01:00
Xmp.dc.description                           LangAlt     1  lang="x-default" Picassa képaláírás: árvíztűrő ütvefúró
Xmp.dc.subject                               XmpBag      2  Picassa cimke: árvíztűrő ütvefúró 1, Picassa cimke: árvíztűrő ütvefúró 2
Xmp.mwg-rs.Regions                           XmpText     0  type="Struct"
Xmp.mwg-rs.Regions/mwg-rs:AppliedToDimensions XmpText     0  type="Struct"
Xmp.mwg-rs.Regions/mwg-rs:AppliedToDimensions/stDim:w XmpText     4  5040
Xmp.mwg-rs.Regions/mwg-rs:AppliedToDimensions/stDim:h XmpText     4  3360
Xmp.mwg-rs.Regions/mwg-rs:AppliedToDimensions/stDim:unit XmpText     5  pixel
Xmp.mwg-rs.Regions/mwg-rs:RegionList         XmpText     0  type="Bag"
Xmp.mwg-rs.Regions/mwg-rs:RegionList[1]      XmpText     0  type="Struct"
Xmp.mwg-rs.Regions/mwg-rs:RegionList[1]/mwg-rs:Name XmpText     5  Lacó
Xmp.mwg-rs.Regions/mwg-rs:RegionList[1]/mwg-rs:Type XmpText     4  Face
Xmp.mwg-rs.Regions/mwg-rs:RegionList[1]/mwg-rs:Area XmpText     0  type="Struct"
Xmp.mwg-rs.Regions/mwg-rs:RegionList[1]/mwg-rs:Area/stArea:x XmpText     8  0.584722
Xmp.mwg-rs.Regions/mwg-rs:RegionList[1]/mwg-rs:Area/stArea:y XmpText     8  0.556101
Xmp.mwg-rs.Regions/mwg-rs:RegionList[1]/mwg-rs:Area/stArea:w XmpText     8  0.070635
Xmp.mwg-rs.Regions/mwg-rs:RegionList[1]/mwg-rs:Area/stArea:h XmpText     8  0.126488
Xmp.mwg-rs.Regions/mwg-rs:RegionList[1]/mwg-rs:Area/stArea:unit XmpText    10  normalized
Xmp.mwg-rs.Regions/mwg-rs:RegionList[2]      XmpText     0  type="Struct"
Xmp.mwg-rs.Regions/mwg-rs:RegionList[2]/mwg-rs:Name XmpText     4  Anyu
Xmp.mwg-rs.Regions/mwg-rs:RegionList[2]/mwg-rs:Type XmpText     4  Face
Xmp.mwg-rs.Regions/mwg-rs:RegionList[2]/mwg-rs:Area XmpText     0  type="Struct"
Xmp.mwg-rs.Regions/mwg-rs:RegionList[2]/mwg-rs:Area/stArea:x XmpText     8  0.155258
Xmp.mwg-rs.Regions/mwg-rs:RegionList[2]/mwg-rs:Area/stArea:y XmpText     9  0.0778274
Xmp.mwg-rs.Regions/mwg-rs:RegionList[2]/mwg-rs:Area/stArea:w XmpText     9  0.0593254
Xmp.mwg-rs.Regions/mwg-rs:RegionList[2]/mwg-rs:Area/stArea:h XmpText     8  0.106845
Xmp.mwg-rs.Regions/mwg-rs:RegionList[2]/mwg-rs:Area/stArea:unit XmpText    10  normalized
Xmp.mwg-rs.Regions/mwg-rs:RegionList[3]      XmpText     0  type="Struct"
Xmp.mwg-rs.Regions/mwg-rs:RegionList[3]/mwg-rs:Name XmpText     5  Ákos
Xmp.mwg-rs.Regions/mwg-rs:RegionList[3]/mwg-rs:Type XmpText     4  Face
Xmp.mwg-rs.Regions/mwg-rs:RegionList[3]/mwg-rs:Area XmpText     0  type="Struct"
Xmp.mwg-rs.Regions/mwg-rs:RegionList[3]/mwg-rs:Area/stArea:x XmpText     8  0.474702
Xmp.mwg-rs.Regions/mwg-rs:RegionList[3]/mwg-rs:Area/stArea:y XmpText     7  0.48869
Xmp.mwg-rs.Regions/mwg-rs:RegionList[3]/mwg-rs:Area/stArea:w XmpText     9  0.0680556
Xmp.mwg-rs.Regions/mwg-rs:RegionList[3]/mwg-rs:Area/stArea:h XmpText     8  0.122619
Xmp.mwg-rs.Regions/mwg-rs:RegionList[3]/mwg-rs:Area/stArea:unit XmpText    10  normalized
Xmp.mwg-rs.Regions/mwg-rs:RegionList[4]      XmpText     0  type="Struct"
Xmp.mwg-rs.Regions/mwg-rs:RegionList[4]/mwg-rs:Name XmpText     4  Ági
Xmp.mwg-rs.Regions/mwg-rs:RegionList[4]/mwg-rs:Type XmpText     4  Face
Xmp.mwg-rs.Regions/mwg-rs:RegionList[4]/mwg-rs:Area XmpText     0  type="Struct"
Xmp.mwg-rs.Regions/mwg-rs:RegionList[4]/mwg-rs:Area/stArea:x XmpText     7  0.37748
Xmp.mwg-rs.Regions/mwg-rs:RegionList[4]/mwg-rs:Area/stArea:y XmpText     8  0.439732
Xmp.mwg-rs.Regions/mwg-rs:RegionList[4]/mwg-rs:Area/stArea:w XmpText     8  0.102183
Xmp.mwg-rs.Regions/mwg-rs:RegionList[4]/mwg-rs:Area/stArea:h XmpText     8  0.163393
Xmp.mwg-rs.Regions/mwg-rs:RegionList[4]/mwg-rs:Area/stArea:unit XmpText    10  normalized

IPTC and Unicode encoding

By default IPTC tags are encoded using Windows-1250 or ISO-8859-2 (at least on my Hungarian Windows machine) to change this to Unicode the Iptc.Envelope.CharacterSet has to be set to a 3 character long string, containing the following characters x1b x25 x47. When this is set, all strings in IPTC are treated as UTF-8 and all 4 programs are handling this correctly.

Tag handling in multiple containers

The standard location for storing the tags is the Iptc.Application2.Keywords and Xmp.dc.subject. Windows Explorer in addition to this uses the Exif.Image.XPKeywords and Xmp.MicrosoftPhoto.LastKeywordXMP. I have not tested the last two but for the first two there are two flavors of handling them: xmp has priority or the union of both is handled. The difference comes in, when there are different tags in iptc and xmp. If xmp has priority, then the iptc tags are simply lost (not displayed and not conserved when adding new tags), this is the case for XMPView and Picassa. If the union is handled, then both the iptc and xmp tags are processed, and if a new tag is added all tags are copied to iptc and xmp fields; Windows Explorer and iTag is working this way. 


No comments: