Keeping it Small and Simple

2007.12.13

Creating a tree utility in Python, part 2

Filed under: Python Tutorial — Lorenzo E. Danielsson @ 15:01

In the previous tutorial, I showed you how to recursively traverse a directory tree and list all the files and directories it contained. The output the program produced looks slightly spartan, so let’s work on that.

Adding lines

The tree utility uses lines. Each entry is preceded by a ‘|–‘ (‘pipe’ character and two dashes). We will add that to our own implementation. While were at it, we will increase the indentation a little so that our items line


 1 #! /usr/bin/env python
 2
 3 # Show the contents of a directory tree.
 4
 5 import sys
 6 import os
 7
 8 def print_tree(path, indent=”):
 9     """ Recursively print the contents of a directory. """
10     for file in os.listdir(path):
11         fullpath = path + "/" + file
12         print indent + ‘|– ‘ + file
13
14         if os.path.isdir(fullpath):
15             print_tree(fullpath, indent+’    ‘)
16
17 # Process command-line arguments.
18 dir = os.getcwd()
19 if len(sys.argv) == 2:
20     dir = sys.argv[1]
21 elif len(sys.argv) > 2:
22     print "Usage: %s [path]" % sys.argv[0]
23     sys.exit(0)
24
25 # Make sure we really have a path.
26 if not os.path.isdir(dir):
27     print "E: that is not a valid path"
28     sys.exit(0)
29
30 print_tree(dir)

A problem related to the tree lines

Gradually, our tree program is taking shape. But there is still a lot left to do. First of all, our tree output doesn’t quite look right. Our vertical lines “break” any time we get to a sub-directory with content. This seems easy to fix.


 1 #! /usr/bin/env python
 2
 3 # Show the contents of a directory tree.
 4
 5 import sys
 6 import os
 7
 8 def print_tree(path, indent=”):
 9     """ Recursively print the contents of a directory. """
10     for file in os.listdir(path):
11         fullpath = path + "/" + file
12
13         print indent + ‘|– ‘ + file
14
15         if os.path.isdir(fullpath):
16             print_tree(fullpath, indent+’|    ‘)
17
18 # Process command-line arguments.
19 dir = os.getcwd()
20 if len(sys.argv) == 2:
21     dir = sys.argv[1]
22 elif len(sys.argv) > 2:
23     print "Usage: %s [path]" % sys.argv[0]
24     sys.exit(0)
25
26 # Make sure we really have a path.
27 if not os.path.isdir(dir):
28     print "E: that is not a valid path"
29     sys.exit(0)
30
31 print_tree(dir)

Well, what is good is that we no longer have “broken” lines in the tree output. But you may have noticed another problem instead: every single line extends to the very bottom of the output (it’s a bit hard to explain, just try running the program on a few different directory trees and you will see what I mean). What should happen is that any of the vertical lines should only extend up to the last item in that list.

The last item in the list

In order to solve this we need to know if the current item we are processing is the last item in the list. There are several ways we can do this. I’ve chosen to use a way which I find to be readable and fairly easy to understand.

I now store the output of os.listdir() in a variable called files. I then use a slightly different loop as you can see below. The variable i will contain a number that corresponds to the index of an item in the files list. To know whether or not I’m on the last item in the list I can compare i with len(files) – 1. Now, to access an item in the list, I have to use files[i].

Having the ability to know whether the current file we are processing is the last one in the directory listing allows us to fix the problem with the vertical lines extending to the bottom of the tree output. Look at the following program.


 1 #! /usr/bin/env python
 2
 3 # Show the contents of a directory tree.
 4
 5 import sys
 6 import os
 7
 8 def print_tree(path, indent=”):
 9     """ Recursively print the contents of a directory. """
10     files = os.listdir(path)
11     for i in range(0, len(files)):
12         fullpath = path + "/" + files[i]
13
14         print indent + ‘|– ‘ + files[i]
15
16         if os.path.isdir(fullpath):
17             if i == len(files) – 1:
18                 print_tree(fullpath, indent+’    ‘)
19             else:
20                 print_tree(fullpath, indent+’|    ‘)
21
22 # Process command-line arguments.
23 dir = os.getcwd()
24 if len(sys.argv) == 2:
25     dir = sys.argv[1]
26 elif len(sys.argv) > 2:
27     print "Usage: %s [path]" % sys.argv[0]
28     sys.exit(0)
29
30 # Make sure we really have a path.
31 if not os.path.isdir(dir):
32     print "E: that is not a valid path"
33     sys.exit(0)
34
35 print_tree(dir)

It’s beginning to look better and better. But there is still something not quite right. Comparing our output to the output of the tree utility, we notice that in tree, the last output in the list is preceeded by a ‘`–‘ instead of the usual ‘|–‘. Here again we take advantage of the fact that we rewrote our loop to know whether or not we are processing the last file.

Let’s look at our final version of the program for today:


 1 #! /usr/bin/env python
 2
 3 # Show the contents of a directory tree.
 4
 5 import sys
 6 import os
 7
 8 def print_tree(path, indent=”):
 9     """ Recursively print the contents of a directory. """
10     files = os.listdir(path)
11     for i in range(0, len(files)):
12         fullpath = path + "/" + files[i]
13
14         if i == len(files) – 1:
15             print indent + ‘`– ‘ + files[i]
16         else:
17             print indent + ‘|– ‘ + files[i]
18
19         if os.path.isdir(fullpath):
20             if i == len(files) – 1:
21                 print_tree(fullpath, indent+’    ‘)
22             else:
23                 print_tree(fullpath, indent+’|    ‘)
24
25 # Process command-line arguments.
26 dir = os.getcwd()
27 if len(sys.argv) == 2:
28     dir = sys.argv[1]
29 elif len(sys.argv) > 2:
30     print "Usage: %s [path]" % sys.argv[0]
31     sys.exit(0)
32
33 # Make sure we really have a path.
34 if not os.path.isdir(dir):
35     print "E: that is not a valid path"
36     sys.exit(0)
37
38 print_tree(dir)

There we go. Now the output looks quite decent. We will end here for this time. Next time, we’ll add a few command-line options to our tree utility.

It is important that you understand how these small programs work. If you are having problems, just take out a pen and paper. Keep a track of the values of different variables and work yourself through the loop, each time drawing the lines on the paper that the computer would draw on the screen. That way you will easily see exactly how the program works.

I believe that it helps you to work through the program if you type it yourself, line by line. That is part of the reason that I supply the full code to each example instead of just the lines that change. Try to avoid yanking and putting the code. If you really want to learn, do yourself the favor of just spending a few extra minutes typing the program.

Also, the principles behind this program work in other programming languages too, so if you know how to program in another language, say Java, you should easily be able to adapt this program for Java.

2007.12.11

Creating a tree utility in Python, part 1

Filed under: Python Tutorial — Tags: , , — Lorenzo E. Danielsson @ 23:41

You are probably already familiar with the UNIX tree utility. It prints a directory tree to the terminal. It is possible that you have tree installed already. If you don’t, you probably have it in your distro’s repositories. On Debian

# aptitude install tree

should do the job. If you want the source, have a look here.

I am going to show you how you can write your own tree utility from scratch, and I will do it using the excellent Python programming language. You may already know that Python can make walking a directory tree very easy. But for now, let’s pretend that no such facilities exist.

This tutorial is aimed at beginners. Well, you should at least know some basic Python already. You should know how to create and run python scripts. But not much else. I am going to take it very slowly. We are going to let the program evolve. Hopefully that way you will understand why we have written the code in one way or another.

At the end you will have a functional tree utility. Your program then will look very different from the first one we will write. So, let’s get on with it.

The first thing we are going to do is print out the content of the current directory. This can be done as follows:


1 #! /usr/bin/env python
2
3 # Show the contents of a single directory.
4
5 import sys
6 import os
7
8 for file in os.listdir("."):
9     print file

This shouldn’t be too difficult. The os.listdir() function prints the returns a list of the files in the directory passed to it. In this case we pass in the current directory. Note that we don’t really need to import sys at this point, but we will soon use it so we might as well.

Directory to print as a command-line argument

Our first program always listed the contents of the current directory, which might not be all that useful. Let’s modify it to allow us to pass in a directory. If we don’t pass anything in, it will default to the current one.

The code looks as follows:


 1 #! /usr/bin/env python
 2
 3 # Show the contents of a single directory.
 4
 5 import sys
 6 import os
 7
 8 # Process command-line arguments.
 9 dir = os.getcwd()
10 if len(sys.argv) == 2:
11     dir = sys.argv[1]
12 elif len(sys.argv) > 2:
13     print "Usage: %s [path]" % sys.argv[0]
14     sys.exit(0)
15
16 # Make sure we really have a path.
17 if not os.path.isdir(dir):
18     print "E: that is not a valid path"
19     sys.exit(0)
20
21 # Print the contents of the directory.
22 for file in os.listdir(dir):
23     print file

Recursion

The word recursion is something that is sometimes mentioned in books as if it were something really special. It’s not. It’s just a normal function call. The only difference is that you are calling a function from within the function itself. Recursion is very useful in some cases. One such case is when we want to “go into” sub-directories.

We are going to modify the program so that it will print not only the contents of a directory, but it will also print the contents of all the sub-directories as well. To do this we will move some of the functionality into a function.


 1 #! /usr/bin/env python
 2
 3 # Show the contents of a directory tree.
 4
 5 import sys
 6 import os
 7
 8 def print_tree(path):
 9     """ Recursively print the contents of a directory. """
10     for file in os.listdir(path):
11         fullpath = path + "/" + file
12         print file
13
14         if os.path.isdir(fullpath):
15             print_tree(fullpath)
16
17 # Process command-line arguments.
18 dir = os.getcwd()
19 if len(sys.argv) == 2:
20     dir = sys.argv[1]
21 elif len(sys.argv) > 2:
22     print "Usage: %s [path]" % sys.argv[0]
23     sys.exit(0)
24
25 # Make sure we really have a path.
26 if not os.path.isdir(dir):
27     print "E: that is not a valid path"
28     sys.exit(0)
29
30 print_tree(dir)

There we go. Now it prints out the contents of the specified directory and all its sub-directories (if there are any). Moreover, if any of these sub-directories have their own sub-directories the contents of these will be printed as well. And so on.

The workhorse of this little program is the print_tree() function. We create a variable called full path which basically just attaches the path to the current file that we are processing. Notice the second half of the function. If a particular file is a directory, then we recursively call the print_tree() method again, this time passing the new directory to it.

Apart from that there is not much new in this program. Make sure that you understand what is going on at all times. If you are having problems with anything then I suggest that you grab a piece of paper and a pen and work through these examples. Write down the values that the different variables have as the program progresses.

One unfortunate thing with our current program is that the output list looks very flat. We get no indication of which files belong in sub-directories. Let’s fix that.

The beginnings of a tree

Look at the following code. I will explain what happens below.


 1 #! /usr/bin/env python
 2
 3 # Show the contents of a directory tree.
 4
 5 import sys
 6 import os
 7
 8 def print_tree(path, indent=”):
 9     """ Recursively print the contents of a directory. """
10     for file in os.listdir(path):
11         fullpath = path + "/" + file
12         print indent + file
13
14         if os.path.isdir(fullpath):
15             print_tree(fullpath, indent+’ ‘)
16
17 # Process command-line arguments.
18 dir = os.getcwd()
19 if len(sys.argv) == 2:
20     dir = sys.argv[1]
21 elif len(sys.argv) > 2:
22     print "Usage: %s [path]" % sys.argv[0]
23     sys.exit(0)
24
25 # Make sure we really have a path.
26 if not os.path.isdir(dir):
27     print "E: that is not a valid path"
28     sys.exit(0)
29
30 print_tree(dir)

The only thing we’ve added is that the print_tree() function takes an additional argument: indent. This argument is optional, however. If we don’t pass anything to indent, it gets a default value of an empty string. The print statement on line 12 now prints the value of indent before the file name. Since the indent value defaults to an empty string it means that we don’t print anything before the file name by default.

But, something happens on line 15 when we recursively call the print_tree() function. This time we pass the current value of indent with an extra space attached to it. This means that when we print indent followed by the file name again on line 12, the file name is preceeded by a single space. If we recurse again, we now have indent+’ ‘ meaning one space plus another space, making two spaces. And so on.

Simple, eh? I’ll leave you with this for now. Next time we will enhance the visual appearance of our directory tree.

Create a free website or blog at WordPress.com.