04 February, 2013

Cleaning file names in XBMC before scrapping


XBMC is an excellent media center for use on a lot of different platforms. Beside the various playback possibilities it has a very good scrapping and library functionality, which means, that based on the video file name, it can download DVD cover, director, actors and all kind of useful information. When using the program you can select the movie based on all this nicely presented information and pictures and not only the equally looking file names.

During scrapping - when the data is collected from different movie databases on the internet (like IMDB) -  XBMC first tries to figure out the movie title from the file name and then calls a plugin, which then retrieves the data from the internet.

In this post I will discuss, that how the file name to movie title conversion is working. The here mentioned  cleandatetime and cleanstrings can be customized in advancedsettings.xml, so you are able to finetune the process.
  1. The file name is matched against the cleandatetime; a regular expression string, which is finding the date in the file name, and if it finds it, then everything before the date is considered the name of the movie and everything after the date is discarded. The date is a 4 digit number starting with 19 or 20 and separated by special characters like comma, period, hyphen, _, ( and ) from the rest of the file name. The full regexp is the following:
    (.+[^ _\,\.\(\)\[\]\-])[ _\.\(\)\[\]\-]+(19[0-9][0-9]|20[0-1][0-9])([ _\,\.\(\)\[\]\-][^0-9]|$)
    From here  \1 is the movie name, \2 is the date  and \3 and everything else is discarded.
    If it doesn't find a date, then the file extension is removed from the file name, and that is used as the file name. 
  2. After the first step, the result is processed further with the cleanstrings. Cleanstrings is a set of regexps which are processed each after the other. When a match is found, then the match and everything after that is removed from the end of the name.
    As I have a lot of movies with the name: local title - original title I have added a very simple route to my settings:
    <cleanstrings action="append"> <regexp>-</regexp> </cleanstrings>
    This is simply discarding the original title and the scrapper is able to find it.
    It is a petty that it can only remove from the end of the file name, because I have some files when there are some abbreviations in the beginning of the file, it would be nice to be able to remove that as well.
  3. In the last step the underscores and dots are converted to spaces, but the dots only then if there are no spaces at all in the file name.

So this is the process of converting the file name to a movie title, and the way you can customize it. 
If you have a lot of movies to scrap, it is worth customizing the process and if you run into limitations, you can do at least 2 things: a) find or write some utility which renames the movies to a format which is more compatible with XBMC b) in the scrapper plugings, there is also some flexibility to do some regexp processing.

No comments: