Exercise: Unix: Directories and Files

Lesson: Directories And Files

Listing Directory Contents With "ls"

Now that you're comfortable with some of the basics (hopefully), let's move on to something a little more useful. The ls command tells the system to list the contents of a directory. If no directory is specified, you get the contents of the current directory (most likely your "home" directory). To make sure you are in your home directory, type

>cd

If upon typing ls you are simply returned the prompt, this implies that the directory has no files in it. Or does it? Some file or directory names maybe "hidden" and will not show up using a simple ls command. These files usually begin with a period, or "dot" ( . ). It is wise to avoid naming files or directories with a dot prefix unless you really want to hide them. (Sometimes a dot is necessary, such as if you were to set up a World Wide Web directory so that you could create a homepage. These directories are usually named ".www".)

Let's assume that there are a few files in your directory, though you probally haven't created any. More than likely there will be a handful of hidden files around as well. So, let's use the ls command.

>ls

somefile      prog.c      index.html    next

>

The actual layout of the file list may vary, but this will essentially be the output. This is fine, but dosen't give us very much information about the files. Also, we really can't tell if "somefile" and "next" are files or directories (most suffixed names are files, hence we take for granted that "prog.c" and "index.html" are files). The easiest (?) way to determine what the files are, is to use the -l option.

> ls -l

-rw-------   1 jmorris  study  2379 Jun 20 13:14 somefile

-rw-------   1 jmorris  study  2382 Jun 20 13:14 prog.c 

-rw----r--   1 jmorris  study  1262 Jun 25 11:27 index.html

drwx------   4 jmorris  study   512 Jun 24 13:52 next

>

This gives us some useful information. We'll talk about the "permissons" list later (that's the -rw------- thing) but just note that a directory has a "d" as the first letter instead of a "-". We also know who owns these files (some jmorris fellow here, though usually it will be you) as well as the names, size and date that each was last modified.

There is another way to spot directories, that may be a little "cleaner". You can use the -F option with ls and any directories will be followed by a "/". For example:

> ls -F

somefile    prog.c     index.html   next/

>

But we still haven't seen any of the hidden files, so how do we get them to show themselves? Use the -a option. You can combine options, and I would recommend it here. -a can get a tad messey when used by itself, so use it and -l to get a nice listing of your files.

> ls -al

drwx--x--x  22 jmorris  study   2048 Jun 25 11:57 .

drwxr-xr-x 163 root     wheel   4096 Jun 22 09:19 ..

-rw-------   1 jmorris  study  27630 Feb  4 14:01 .hithere

-rw-------   1 jmorris  study   2379 Jun 20 13:14 somefile

-rw-------   1 jmorris  study   2382 Jun 20 13:14 prog.c

-rw----r--   1 jmorris  study   1262 Jun 25 11:27 index.html

drwx------   4 jmorris  study    512 Jun 24 13:52 next

>

We've found the hidden files! But what are the hidden directories "." and ".."? And who's this root fellow? Though "." can mean different things, in this case it means the "current" working directory. ".." (dot dot) means the parent directory. By parent we mean the directory that this directory (".") is a subdirectory of. As for root, it is simply where your account is stored, and hence is the parent of all users. (There's a little more to it than that, but we won't worry too much about it.)

Making And Changing Directories

Before we go and change directories, I thought you might want to make a directory of your own. To make (create) a directory use the mkdir command.

Try creating two new directories:

> mkdir misc1
> mkdir misc2
>

You enter a directory like this:

> cd misc1
>

The cd (change directory) command makes "misc1" the current directory. Suddenly you remember that you wanted to go in the "misc2" directory and not here. How do you get back? There are two ways. First, you could go back to the "parent" directory (dot dot) and then cd into the "new" directory.

> cd ..
> cd misc2
>

Secondly, you could use a relative pathname to go straight to "misc2".

> cd ../new
>

Roughly translated this command means: "go up to the parent directory and then change to the new directory". Once you get the hang of it, this is the best way to get around in your directories. If you get lost, you can always give the absolute directory references like this:

cd /nsm/home/mwbecker

or, to go to your home directory, just type cd without any arguments:

cd

Also, you can always ask the system where you are with the command pwd which stands for present working directory.

Exercise

1. Go to your home directory using the cd command.

2. List all of the files in your home directory, including hidden files.

3. Move up one directory, and then determine the present working directory.

4. Make your terminal window shorter than the list of files. Now list all the files with file information, forcing the terminal to print only one page at a time using the more pipe command.

Viewing and Editing Files

The easiest way to look at a file is to use the more command. Try displaying a text file if you have one lying around. The syntax is:

> more filename

The more command displays a file page by page, just like the more pipe command. Just type q to stop the display.

An editor is a utility program which you use to make modifications to the contents of a file. When we talk about an editor, we usually mean a text editor. That is, an editor designed to deal with files containing strings of characters in a particular character set. Additionally an editor usually means an interactive utility, where you can view what you have already done, before deciding to make changes.

There are two types of editors, line and screen. The basic unit for change in a line editor is a line (a string of characters terminated by a newline character). The popularity of line editors pales in comparison to that of screen editors. A screen editor is one where a portion of the file is displayed on the terminal screen, and the cursor can be moved around the screen to indicate where you want to make changes. You can select which part of the file you want to have displayed. Screen editors are also called display editors, or visual editors.

Two of the more popular screen editors used in these parts are vi and pico . Most people who use unix use the vi editor, but this editor has a very steep learning curve. We will use pico, but be aware that sometimes pico doesn't display hidden characters (like returns) correctly and it may be necessary to resort to vi if you think something should be there that isn't.

To see what pico looks like try typing:

> pico testfile1
>

This should create an empty file. Type in several lines of nonsense text and save the file. Choose to exit pico and save the buffer (or just click CTRL-O). List the contents using the more command.

If you want to learn more about pico, see the online documentation at:

http://www.acsu.buffalo.edu/~zelli/pico.html

Copying, Moving, and Renaming Files

-> cp command

cp (copy) lets you copy a file to another file. It takes two arguments, the names of files , or pathnames to files. Try a simple example:

>cp testfile1 testfile2

which means "copy from testfile1 to testfile2". You should have created a file called testfile2.

You can use most of the "tricks" discussed in changing directories (cd) with cp (as well as mv, rm). For example, try copying testfile1 to the misc1 directory you created previously:

>cp testfile1 misc1/testfile1

or

>cp testfile1 misc1

Both of these commands create a copy, if you don't give the file name it is assumed that it stays the same. Finally, cd to misc1 and copy the file testfile1 to testfile3, in the parent directory:

>cp testfile1 ../testfile3

One last note. You can copy multiple files using cp. However, you can only copy the files to an existing directory and you can't rename them.

>cp file1 file2 dir1 (should work)

or

>cp file1 file2 file3 (should give an error)

-> mv command

mv (move) is very similar to cp, except that you are moving the same file around and not making copies of it. There are two basic uses for mv: renaming files, or moving them to another location.

For example, move testfile1 into the misc2 directory:

>mv testfile1 misc2

Note that after this command is sucessful, there is no longer a file named testfile1. Its contents are now in the newly created misc2/testfile1 file.

Note: you need to be very careful with mv and cp because they will overwrite files without warning. These files cannot be recovered once overwritten! Unlike the rm command (discussed next) there are no warnings given when using cp or mv.

-> rm command

As is implied by its name, rm allows you to remove (delete) files and directories.

First try removing testfile1 from the directory misc2. You can either cd to misc2 and remove like this:

>cd misc2
>rm testfile1

or you can remove it from the misc2 directory while still in the parent directory:

>rm misc2/testfile1

Use the ls command to assure that the file has been removed.

You can remove multiple files like this:

>rm-i file1 file2
rm: remove file1?n
rm: remove file2?y
>

If you type anything which begins with a y (for yes), rm accepts that as confirmation that the file is to be removed. If you type anything which does not start with y, rm does not remove the file. So in our example file2 has been removed, but file1 has not.

Another option which is the inverse of "-i" is "-f" (for force). "-f" forces removal of the files without question, even if they are write protected. (I'll get to "write protected" soon in the permissions section. Basically, you don't have permission to make changes to the contents of the file.)

Try to delete the directory misc1 using rm. You probally got an error message similar to:

>rm next
rm: next is a directory
>

To remove directories you could use the rmdir command, but it has a disadvantage: the directory must be empty. It cannot contain any files or subdirectories. Luckily, we can use rm and another fancy option to remove directories (and their contents) efficently.

When used with the "-r" (for recursive) option, rm searches down the directory tree, removing all files it finds. When a subdirectory is empty, rm then removes that subdirectory. The command does this for every file in every subdirectory (and so on) that it finds in the specified directory. On the UB systems, it may not allow you to use the -r command, and will make you confirm every deletion. If this happens, type the command unalias rm, and then try it again.

Using Wild-cards

Now to very quickly mention some special characters which are often called "wild-card" characters, taken from the analogy with card games, where a Joker can be any card. These special characters are used to "match" filenames or parts of filenames. They ease the job of specifying particular files or whole groups of files.

The first wild-card (or metacharacter) is the question mark "?". It matches any single character.

>ls c?

This command finds all files that have names consisting of the letter "c" followed by a single character. You can use multiple "?"'s:

>ls c??

prints the names of all the files which consist of a "c" followed by two more characters.

The asterisk character * (star) matches any string of characters, including a string of zero length (also called a null string):

>ls c*

This command finds all filenames that begin with "c", regardless of how long the filename is.

A string of characters enclosed in the [ and ] brackets is known as a "character-class". The meaning of this construct is "match any single character which appears within the brackets". For example:

>ls c[12684xyz]

This command lists all filenames which begin with "c" and are followed by one of "1" or "2" or "6" or "8" or "4" or "x" or "y" of "z".

All wild-cards can be combined together to form arguments. Also, I only used ls in the above examples, but you can use any command. (But, be careful! Especially with rm.)

About File Permissions

Every file or directory anyone creates on a UNIX system has an owner, usually the person who created the file or directory in the first place. The owner of a file can then assign various permissions (or protections), allowing or prohibiting access to that directory or file.

For every file and every directory in the file system, there are three classes of users who may have access:



Owner       The Owner is the user who intially created it.



Group       Since a bunch of users can be combined into a user

            group, there is a group ownership associated with 

            each file and directory.



Public      All other users of the UNIX system.  That is, 

            anyone who has a user-name and can gain access  

            to the system.

Every file and every directory on the UNIX system has three types of permission, which describe what kinds of things can be done with the directory or file. Because directories and files are slightly different entities, the interpretation of the permissions also differs slightly. The meanings assigned to the permissions are:



Read        A user who has read permission for a file can look

            at the contents of that file.



            A user who has read permission for a directory can

            find out what files there are in that directory.



Write       A user who has write permission for a file can 

            change the contents of that file.



            A user who has write permission for a directory can

            change the contents of the directory: they can 

            create new files and remove existing files.



Execute     A user who has execute permission for a file can

            use that filename as a UNIX system command.



            A user who has execute permission for a directory

            can change directory to that directory, and can 

            copy from that directory, provided they also have 

            read permission for that directory.

By combining the three types of permissions and the three types of user, we can come up with a total of nine sets of permissions:


read    permission for the owner (user)

write   permission for the owner (user)          

execute permission for the owner (user) 

       

        read    permission for the group

        write   permission for the group

        execute permission for the group

       

                read    permission for the public (others)

                write   permission for the public (others)

                execute permission for the public (others)

These nine permission are usually written:

rwxrwxrwx

A missing permission is indicated by a "-" and is called a "protection". The nine permissions, or protections, are collectively known as the "mode" of the file or directory.

Changing Permissions (chmod)

The command chmod changes the mode of a file or directory. The mode can only be changed by the owner (that is the user who first created the file or directory), or by the super-user (root). Unix is very concerned with security, so when files are created they can only be altered by the owner. There may be times when you want more people to have access to a file, however, such as when you are making web pages, or want to share a directory with another person or a group of person. Permissions are complex, so we will only cover how to set some common permissions here.

chmod can be used in several ways, but here we will use only the "numerical specification". This method involves associating each permission with a 1 and protection by a 0 (the binary system), hence "rw-r--r--" translates to "110100100". But using binary is not as nice as one would like (as you can imagine) so "110100100" is translated into "644" in octal notaion. Rather than try to explain octal notation, we will just use octal notation for certain files because it is convenient. For example web page files (html) are usually given the permission 644, because they only need to be read, but need to be read by everybody.

So your chmod command will look like:

>chmod 644 thisfile thatfile theotherfile
>

Again, you can change the mode of many files at the same time (as above) or simply one file at a time.

Archiving And Compressing Files

The tar command is most often used to archive files. It will be an extremely important command for this class, because you will be generating large files and have limited file space. Typically, we store files in a "tarball". In Unix, the tarball is a two step process: we use the tar command to gather all the files together, and then gzip or compress to reduce the size of the tar file. You may be familar with zip files on the pc. Programs like winzip do both tar and gzip or compress in one step. That is far to straightforward for Unix.

The format of the tar command is

tar {options} file1 file2 ... fileN

where {options} is the list of commands and options for tar, and file1 through fileN is the list of files to add or extract from the archive.

For example, try archiving your misc1 and misc2 directories into a tar file:

> tar cvf backup.tar misc1 misc2

This will pack all of the files in misc1 and misc2 into the tar archive "backup.tar" while maintaining the directory structure. The first argument to tar, cvf, is the tar "command''. c' tells tar to create a new archive file. The v option forces tar into verbose mode---printing each filename as it is archived. The f' option tells tar that the next argument---backup.tar---is the name of the archive to create. The rest of the arguments to tar are the file and directory names to add to the archive.

The command

> tar xvf backup.tar

will extract the tar file backup.tar in the current directory. This can sometimes be dangerous---when extracting files from a tar file, old files are overwritten.

Furthermore, before extracting tar files it is important to know where the files should be unpacked. For example, let's say you archived the following files: next/index.html, next/prog.c, and next/assign.dvi. If you use the command

> tar cvf backup.tar next/index.html next/prog.c next/assign.dvi

the directory name next/ is added to the beginning of each filename. In order to extract the files to the correct location, you would need to use the following commands:

> cd /
> tar xvf backup.tar

because files are extracted with the pathname saved in the archive file.

If, however, you archived the files with the command

> cd next
> tar cvf index.html prog.c assign.dvi

the directory name is not saved in the archive file. Therefore, you would need to ``cd next'' before extracting the files. As you can see, how the tar file is created makes a large difference in where you extract it. The command

> tar tvf backup.tar

may be used to display an ``index'' of the tar file before unpacking it. In this way you can see what directory the filenames in the archive are stored relative to, and can extract the archive from the correct location.

Unlike archiving programs for the pc, tar does not automatically compress files as it archives them. Therefore, if you are archiving two 1-megabyte files, the resulting tar file will be two megabytes in size. The gzip command may be used to compress a file (the file to compress need not be a tar file).

Try compressing the backup.tar file using gzip:

> gzip backup.tar

will compress backup.tar and leave you with backup.tar.gz, the compressed version of the file.

The gunzip command may be used to uncompress a gzipped file.

gzip is a relatively new tool in the UNIX community. For many years, the compress command was used instead. However, because of several factors, compress is being phased out.

compressed files end in the extension .Z. For example, backup.tar.Z is the compressed version of backup.tar, while backup.tar.gz is the gzipped version. The uncompress command is used to expand a compressed file; gunzip knows how to handle compressed files as well.

Therefore, to archive a group of files and compress the result, you can use the commands:

> tar cvf backup.tar next
> gzip backup.tar

The result will be backup.tar.gz. To unpack this file, use the reverse set of commands:

> gunzip backup.tar.gz
> tar xvf backup.tar

Of course always make sure that you are in the correct directory before unpacking a tar file.

You may want to do this process in one command, and then go have lunch. Tarballing files can be rather slow. You can use some UNIX cleverness to do all of this on one command line, as in the following:

> tar cvf - next | gzip > backup.tar.gz

Here, we are sending the tar file to ``-'', which stands for tar's standard output. This is piped to gzip, which compresses the incoming tar file, and the result is saved in backup.tar.gz. The -c option to gzip tells gzip to send its output to stdout, which is redirected to backup.tar.gz.

A single command used to unpack this archive would be:

> gunzip -c backup.tar.gz | tar xvf -

Again, gunzip uncompresses the contents of backup.tar.gz and sends the resulting tar file to stdout. This is piped to tar, which reads ``-'', this time referring to tar's standard input.

Exercise:

Create a temporary directory called misc3 at the same level as misc1 and misc2. Move the tar-balled file into that directory, and then unzip the file. Check the contents of the unzipped tar file. No extract the tar file. Use the ls command to look at the directory structure. Notice that it recreates the directory from the level at which the tar command was executed.

Sending and Receiving Remote files (FTP)

FTP is an acronym for File Transfer Protocol. FTP is a client/server application that allows the transfer of files between computers. This transfer can take place between a mainframe and a local terminal, or as a transfer of information over the Internet between your computer and a distant server. FTP is a powerful application which allows users to access archives that are available on a large number of computer hosts.

The key elements of FTP are:

finding FTP sites from your client based system

establishing a connection with the server

developing an ability to search through "archives" to retrieve information

using FTP commands to facilitate the transfer of information

allowing for the differences in file types and compressions techniques.

The idea of client/server is important here - that you, the "local" client, are initiating a communication pathway with a "remote" server that may contain public information of interest to you. We will look at some of the basic commands and tools for FTP, and then point to other sources for a more detailed discussion of the subject matter.

Basic Commands

Some of the commands you use to get around in FTP are the same as normal Unix commands:

cd allows you to change directories on the remote computer either up or down
pwd gives the client a chance to view the current directory and pathway on the remote host

Others are particular to FTP:

lcd, lpwd are the ways in which you can do the above for managing your local file
open initiates the session between the client and the server
dir, ls list the heirarchical organization of files on the remote server
get allows you to retrieve a file from the server down to your local client computer.
put allows you to place a file from your client up to the remote server computer.
mput/mget are the same as above but allow for multiple files to be manipulated with a single command
prompt sets interactive prompting; "on" is a safety feature prompting you for verification of each step of the multiple commands, "off" allows the commands to act unimpeded
ascii/binary allows to specify the type of file to be transferred
quit ends the connection and ends the session.

To find the full array of commands, type help or ? at the FTP> prompt. The above list is by no means exhaustive but does give the most commonly used commands.

Initiating a Session

Most commonly you will initiate "anonymous" ftp sessions. This means you are logging into a server that doesn't care who you are, you can get the files. This is what free software sites etc use to distribute files. There is a pretty standard way to access anonymous ftp sites. You connect to the site, give "anonymous" as your username and your email as the password.

For example, try connecting to the engineering ftp site here on campus, you should get something like this:

> ftp ftp.eng.buffalo.edu
Connected to confucius.eng.buffalo.edu.
220 confucius.eng.buffalo.edu FTP server (Version wu-2.6.1(3) Thu Sep 14 11:24:35 EDT 2000) ready.
Name (ftp.eng.buffalo.edu:mwbecker): anonymous
331 Guest login ok, send your complete e-mail address as password.
Password:

You will get a login screen welcoming you the the ftp site and some notice like "230 Guest login ok, access restrictions apply."

Once inside, you can use the above commands to get around. Use the dir command to list the directories at the site. You will see one called "pub" for public. This is the standard place to put downloadable files at ftp sites.

Use the cd command to navigate to the subdirectory in pub/ called mwbecker. Download the file "ftptest1.txt" from the directory using the get command. You should get a session that looks something like this:

ftp> get ftptest1.txt
200 PORT command successful.
150 ASCII data connection for ftptest1.txt (128.205.128.67,32830) (6901 bytes).
226 ASCII Transfer complete.
local: ftptest1.txt remote: ftptest1.txt
7008 bytes received in 0.017 seconds (407.34 Kbytes/s)
ftp>

Notice that ftp says it is opening an ASCII connection. This means that it is reading text as characters, rather than binary (0's and 1's). It is important that you specify the right transfer type, because if you download text as binary, it won't read right on a pc, though it usually OK on a Unix machine. If you download a binary file (say an excel spreadsheet or a program), it won't function on a pc or a unix machine.

If you wanted both ftptest1.txt and ftptest2.txt, you can use the mget command using wildcards. For example, type mget *.txt to get all the txt files:

ftp> mget *.txt
mget ftptest1.txt? y
200 PORT command successful.
150 ASCII data connection for ftptest1.txt (128.205.128.67,32832) (6901 bytes).
226 ASCII Transfer complete.
local: ftptest1.txt remote: ftptest1.txt
7008 bytes received in 0.016 seconds (422.38 Kbytes/s)
mget ftptest2.txt? y
200 PORT command successful.
150 ASCII data connection for ftptest2.txt (128.205.128.67,32833) (3595 bytes).
226 ASCII Transfer complete.
local: ftptest2.txt remote: ftptest2.txt
3652 bytes received in 0.0085 seconds (418.44 Kbytes/s)
ftp>

List to the screen the contents of the of the files that you downloaded. If you immediately recognize the text, you probably never had a date in high school.

Exercise:

FTP the files from the site again, this time specifying a binary transfer. Examine the result. Does it look the same?

Putting (Sending) Files

Sending files to a remote location is a trickier matter, because unix systems don't like outsiders invading their directories. You can, however, put files into directories that you own, quite easily. This is done just as you used the get and mget, but you use put and mput. As you will probably not have to do this in this class, we will not go through an example.

< Getting Started | Other Unix Stuff >