Description: Find duplicated files/directories
Author: Catalin(ux) M. BOIE
Start date: 2012-04-09

Plan:
- compute sha1 on files/dirs lazy (check only size and only after the checksum).
- sort files and dir tables
- check directories first
- check files, hiding all siblings reporting above


DIR
	subdir1
		subsubdir1
		subsubdir2
		file1

DIR
	subdir2


DIR->subdirs = subdir1
subdir1->next = subdir2

subdir1->subdirs = subsubdir1
subsubdir1->next = subsubdir2


== Pseudocode ==
main.c: for every directory passed as parameters:
	call ntfw with callback 'callback':
       		ignore !files and !dirs
		if we already seen that inode, skip it
		if is a dir, call dir_add:
			alloc a dir node and fill name, dev, ino, level
			if is a level 0 dir (passed as para), add it to
				dir_info array
			else
				find parent dir and set ->parent to it
				->next_sibling = parent->subdirs
				parent->subdirs = q
		else, call file_add:
			alloc a file node q
			set size, name, dev, ino, level and init SHAs
			find parent and add q to parent->files
			set also the parent
			now, add q also to a hash by size (file_info), sorted by size
	call file_find_dups
		for every bucket of file_info that has at least one item
			if we have no next, it means that we cannot have a dup
				and we mark it up with no_dup_possible flag
			for every item in hash:
				we group by size and we call compare_file_range
					compare_file_range will fill item->dups
	call dir_find_dups
		for every dir passed as para (dir_info):
			call dir_build_hash
			we allocate an array that will keep all dirs that may have matches
			for every possible dir we call dir_find_dups_populate_list
			sort dirs by hash
			find same hash dirs
				call dir_process_range on first..last with same hash
					link all dirs under the lowest level one
	call dump_duplicates
		if flag no_dup_possible is set, skip
		if do_not_dump is set, skip
		if is alone in the chain, no dup possible, skip
		for every same hash dir:
			if left is 1, we skip it because was already dump
			if do_not_dump is set, skip
			mark dir as left, to not appear in a 'right' position
			mark main dir as 'do_not_dump', because we already dumped it
			mark current dir as 'do_not_dump' because we already dumped it
			dump

Damn complicated.
Let's try a simple approach.
Let's build a single linked list of files, order by size. Hash was too complicated and saved nothing.
Maybe saved some time to add files inside
Build the dirs list
	Keep in mind mark up dirs that contains files that cannot have duplicates (unique size).
	Don't forget to sort the files inside a dir before building the hash.