Keeping it Small and Simple

2007.12.11

Creating a tree utility in Python, part 1

Filed under: Python Tutorial — Tags: , , — Lorenzo E. Danielsson @ 23:41

You are probably already familiar with the UNIX tree utility. It prints a directory tree to the terminal. It is possible that you have tree installed already. If you don’t, you probably have it in your distro’s repositories. On Debian

# aptitude install tree

should do the job. If you want the source, have a look here.

I am going to show you how you can write your own tree utility from scratch, and I will do it using the excellent Python programming language. You may already know that Python can make walking a directory tree very easy. But for now, let’s pretend that no such facilities exist.

This tutorial is aimed at beginners. Well, you should at least know some basic Python already. You should know how to create and run python scripts. But not much else. I am going to take it very slowly. We are going to let the program evolve. Hopefully that way you will understand why we have written the code in one way or another.

At the end you will have a functional tree utility. Your program then will look very different from the first one we will write. So, let’s get on with it.

The first thing we are going to do is print out the content of the current directory. This can be done as follows:


1 #! /usr/bin/env python
2
3 # Show the contents of a single directory.
4
5 import sys
6 import os
7
8 for file in os.listdir("."):
9     print file

This shouldn’t be too difficult. The os.listdir() function prints the returns a list of the files in the directory passed to it. In this case we pass in the current directory. Note that we don’t really need to import sys at this point, but we will soon use it so we might as well.

Directory to print as a command-line argument

Our first program always listed the contents of the current directory, which might not be all that useful. Let’s modify it to allow us to pass in a directory. If we don’t pass anything in, it will default to the current one.

The code looks as follows:


 1 #! /usr/bin/env python
 2
 3 # Show the contents of a single directory.
 4
 5 import sys
 6 import os
 7
 8 # Process command-line arguments.
 9 dir = os.getcwd()
10 if len(sys.argv) == 2:
11     dir = sys.argv[1]
12 elif len(sys.argv) > 2:
13     print "Usage: %s [path]" % sys.argv[0]
14     sys.exit(0)
15
16 # Make sure we really have a path.
17 if not os.path.isdir(dir):
18     print "E: that is not a valid path"
19     sys.exit(0)
20
21 # Print the contents of the directory.
22 for file in os.listdir(dir):
23     print file

Recursion

The word recursion is something that is sometimes mentioned in books as if it were something really special. It’s not. It’s just a normal function call. The only difference is that you are calling a function from within the function itself. Recursion is very useful in some cases. One such case is when we want to “go into” sub-directories.

We are going to modify the program so that it will print not only the contents of a directory, but it will also print the contents of all the sub-directories as well. To do this we will move some of the functionality into a function.


 1 #! /usr/bin/env python
 2
 3 # Show the contents of a directory tree.
 4
 5 import sys
 6 import os
 7
 8 def print_tree(path):
 9     """ Recursively print the contents of a directory. """
10     for file in os.listdir(path):
11         fullpath = path + "/" + file
12         print file
13
14         if os.path.isdir(fullpath):
15             print_tree(fullpath)
16
17 # Process command-line arguments.
18 dir = os.getcwd()
19 if len(sys.argv) == 2:
20     dir = sys.argv[1]
21 elif len(sys.argv) > 2:
22     print "Usage: %s [path]" % sys.argv[0]
23     sys.exit(0)
24
25 # Make sure we really have a path.
26 if not os.path.isdir(dir):
27     print "E: that is not a valid path"
28     sys.exit(0)
29
30 print_tree(dir)

There we go. Now it prints out the contents of the specified directory and all its sub-directories (if there are any). Moreover, if any of these sub-directories have their own sub-directories the contents of these will be printed as well. And so on.

The workhorse of this little program is the print_tree() function. We create a variable called full path which basically just attaches the path to the current file that we are processing. Notice the second half of the function. If a particular file is a directory, then we recursively call the print_tree() method again, this time passing the new directory to it.

Apart from that there is not much new in this program. Make sure that you understand what is going on at all times. If you are having problems with anything then I suggest that you grab a piece of paper and a pen and work through these examples. Write down the values that the different variables have as the program progresses.

One unfortunate thing with our current program is that the output list looks very flat. We get no indication of which files belong in sub-directories. Let’s fix that.

The beginnings of a tree

Look at the following code. I will explain what happens below.


 1 #! /usr/bin/env python
 2
 3 # Show the contents of a directory tree.
 4
 5 import sys
 6 import os
 7
 8 def print_tree(path, indent=”):
 9     """ Recursively print the contents of a directory. """
10     for file in os.listdir(path):
11         fullpath = path + "/" + file
12         print indent + file
13
14         if os.path.isdir(fullpath):
15             print_tree(fullpath, indent+’ ‘)
16
17 # Process command-line arguments.
18 dir = os.getcwd()
19 if len(sys.argv) == 2:
20     dir = sys.argv[1]
21 elif len(sys.argv) > 2:
22     print "Usage: %s [path]" % sys.argv[0]
23     sys.exit(0)
24
25 # Make sure we really have a path.
26 if not os.path.isdir(dir):
27     print "E: that is not a valid path"
28     sys.exit(0)
29
30 print_tree(dir)

The only thing we’ve added is that the print_tree() function takes an additional argument: indent. This argument is optional, however. If we don’t pass anything to indent, it gets a default value of an empty string. The print statement on line 12 now prints the value of indent before the file name. Since the indent value defaults to an empty string it means that we don’t print anything before the file name by default.

But, something happens on line 15 when we recursively call the print_tree() function. This time we pass the current value of indent with an extra space attached to it. This means that when we print indent followed by the file name again on line 12, the file name is preceeded by a single space. If we recurse again, we now have indent+’ ‘ meaning one space plus another space, making two spaces. And so on.

Simple, eh? I’ll leave you with this for now. Next time we will enhance the visual appearance of our directory tree.

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: