get unique root links from a directory of symlinks

Issue

I have a largish directory filled with symlinks (created using ln-s) – about 1million of them. They look like so:

--img_dir
  -- img.jpg --> /path/to/some/img.jpg
  -- imgc.jpg --> /path/to/some/imgc.jpg
  -- imgd.jpg --> /path/to/some/imgd.jpg
  -- img2.jpg --> /path2/to2/some2/img2.jpg
  -- img3.jpg --> /path3/to3/some3/img3.jpg
  -- img21.jpg --> /path21/to21/some21/img2.jpg
  -- img31.jpg --> /path31/to31/some31/img3.jpg
<snip>

for record keeping purposes, I would like a list of unique base_dirs (the root directories) from which the symlinks have been created.

So, I would like the following output:

/path/to/some
/path2/to2/some2
/path3/to3/some3
/path21/to21/some21
/path31/to31/some31

I tried googling around to see how one can achieve this in bash but I am not able to find anything useful..

Any help or pointers would be much appreciated.

Solution

  • find can list symlinks
  • realpath turns symlinks into absolute paths
  • dirname strips final component from a path
  • sort sorts lines and can dedupe
find img_dir -type l | xargs realpath | xargs dirname | sort -u

Or, logging errors:

find img_dir -type l 2>find-errs     |
xargs realpath       2>realpath-errs |
xargs dirname        2>dirname-errs  |
sort -u               >basedir-list

Some implementations of realpath and dirname may only allow a single argument. In that case, do

... | xargs [email protected] realpath @ | xargs [email protected] dirname @ | ...

The code above assumes no really wierd paths (eg. mustn’t contain newlines).

Answered By – jhnc

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published