Perl CGI Security Notes by Chris

Projects of
Chris X Edwards

Objective

After hearing how some relatively simplistic attacks against Perl CGI programs have caused trouble at my institution, I started to worry that I may have been putting a lot of naive code out there. I wanted to learn the fundamental mechanisms that can be used when exploiting a Perl CGI program and do my best to limit the liability of my software. There is a lot of information on this topic, but it is not very concise or centralized. This document attempts to be a collection of issues that Perl CGI programmers should be aware of. Unlike many documents with this kind of information, my perspective is that of the programmer trying to be defensive, not of the black hat trying to be naughty. If I have made any mistakes or overlooked other obvious threats, please feel free to contact me.

Play Along At Home

If you really want to get a feel for Perl CGI vulnerabilities, then you really should try some out. However, poking around your favorite on-line merchant's shopping cart is likely to get you a visit from the FBI. Fortunately, I have written a nice collection of programs designed to be vulnerable to various exploits. I recommend running these on a closed system. You'll need a working copy of apache, Perl, and the CGI.pm. I found that a Mac OS X laptop works great and can be self-contained. The scripts I have prepared are very, very simple and will be easier to deal with than writing your own, but, hey, do that too! Let me know what you come up with.

Trust Nothing

The Internet is a scary place. If you are planning to offer some software service to the general public over the Internet, you must increase your awareness of what the web user has control over and the ramifications of this. The first thing to understand is that the CGI interface using HTML forms is merely a courtesy to well-intentioned users. An attacker using your script as a point of weakness will often bypass the HTML form completely. Perhaps you put a maximum limit on a text field's length. If you say:

<input type=text name=protein size=12>

in your form, you might think that you wouldn't have to worry about the value of param('protein') being longer than that. Wrong! By specifying the variable in the URL, the naughty user bypasses this restriction.

http://www.xed.ch/cgi-bin/pnb.pl?protein=#!/bin/bash;#A%20complete%20nasty%20script%20follows...

The naughty user can supply your script with variables that you never mentioned. Imagine if your script automatically does something with all variables it receives (like write a file named after the variable) under the assumption that all of them are sanctioned by the program. This is a very fragile system that is easy to abuse.

The user is not constrained by the contents of a selection list. Just because those were the only options you wanted to make available to your program doesn't mean those are the only ones the attacker can use.

Users can also change the values of "hidden" form elements. It may even come as a surprise to some beginners to learn that hidden form elements are in no way effectively hidden from the user. These variables are simply undisplayed by default. They can be viewed and even sent back with different values.

Lesson

Users are not constrained by the limitations imposed by CGI form elements. Things like maximum length and limited selection sets should be regarded as a suggestion only. What the user really gives you could be anything.
Don't automatically process all variables a program might recieve. Only pay attention to the variables that your program really understands. Maybe even think about being on the look out for and logging spurious variables.
If you use cookies, don't depend on them. Assume the user will maliciously reform it (or send you cookies you never set).

Character Reference

Often novices find the good advice to steer clear of cartoon expletives. In otherwords, don't use any #@^!$~ special characters for any reason. But you still find people who have been warned against this kind of thing still naming files like this: Prices\ in\ $.xls

I avoid these characters like the plague. Maybe more programmers and users would too if they understood just what mischief each and every one could do. So here's a big but not comprehensive list describing many bad characters and the potential problems associated with them. At the very least, it's helpful to know what's at the top of the list to defend against if you must use a special character. This list is oriented towards the character's behavior in bash, although a lot of these issues extend by conventions to other systems and environments. What's worse is that in other environments, these characters may have yet more obscure and dangerous problems to contend with.

` Command Substitution Allows arbitrary commands to be executed.

$ Command Substitution Variable substitution.

( ) Command Substitution Equivalent to backtics $(command).

; Command Separator Allows attacker to write entire scripts in an input form box.

| Pipe Redirects Output to a new arbitrary process (which doesn't even have to care about the stdin).

/ Directory Separator Helps attacker escape from safe directory.

\ Character Escape Evades bad char checks: echo -e "bogus\x7crm" produces "bogus|rm".

{ } Parameter Expansion, Brace Expansion Lots of obscure functionality. Allows attacker to try several things at once.

& Redirecting File Descriptors, Job Control Changing stdout, stderr, etc.

' Protective Quote Used to hide special shell characters for later execution.

" Mild Quoting Allows for filenames/strings like "rm -r" (space included).

* Pathname Expansion Find out or delete the contents of a directory.

? Pathname Expansion Allows attacker to know less about the attacked system.

< Redirect Input Change source of input, create heredocs and herestrings. Used to read senstive files.

> Redirect Output Allows messing with filesystem, e.g. overwriting/appending critical files.

[ ] Conditional Expressions, Pathname Expansion Allows fishing for the existence of various files.

! History Substitution Allows picking through command history file. #! reveals current command script uses.

^ History Substitution Relatively inert.

~ Tilde Expansion Access home directories of accounts.

- Option Specification Although a common and reasonable character in filenames, can cause option confusion.

# Comment, History Substitution Hide things to get past shell. Also has about 5 other minor functions in bash.

\n Newlines New lines where not expected are bad.

\r Carriage Returns Can be confusing too.

space Token Separator Can mess up command lines. Cause action on multiple unitended items.

Ctl Chars Embedded Control Characters Cause obfuscated mischief.

Lesson

Don't make funky characters a standard part of your file naming scheme or command line options. Ban them when you can.
Don't waste too much time trying to prohibit all bad characters - it's too easy to overlook something. Instead confirm that input contains only good characters.

Traversal Vulnerabilities

Where data is stored on the filesystem must be given some thought. Perhaps you have a directory where untrusted users can access public files through your CGI scripts. Here is a very simplified example of a program that allows a web user to retrieve the contents of a file in a safe directory.

$PATH="/tmp/ok";
$n=param('name');
open (FILE, "<$PATH$n");
print while ();
close (FILE);

The problem here is that the CGI user also can read a file from any other directory that the web user can (often root, a whole different bad problem). Even though a path is hard coded in the script, the CGI user needs only to enter something like this in the form that asks for the file name:

../../etc/passwd

Also don't forget that the attacker can easily read your HTML source, so a JavaScript validation isn't going to work since the attacker can just send the sneaky request in the URL:

http://www.xed.ch?name=../../etc/passwd

Lesson

If periods shouldn't be in your input, don't allow them.
If your input can contain periods as in "file.txt", consider filtering double periods.
Watch for backslashes escaping periods you think you're filtering.
JavaScript provides no security for CGI Perl programs. It can always be circumvented.

"open()" Vulnerabilities

The open command in Perl is very naughty. Since Perl's creators went out of their way to make this command as easy to use as possible, it is easy for attackers to use too. The worst problem comes from the default behavior which is to open a file for reading. Here is a simple usage:

$n=param('name');
open (FH, $n);
#.... It doesn't matter what's here, it's too late.
close (FH);

You might think that this allows the user to specify a name and if it exists, it will be opened. That's true; the program will work that way. But there is much more functionality hidden in there. Basically this construction allows an outside user to do pretty much anything to the filesystem (that the web user can do).

Many people don't realize that the open command supports very powerful pipes. Think of it like this: when a file is opened for reading or writing, data is prepared to be taken from or sent to that file. When a pipe is opened for reading or writing, data is prepared to be taken from or sent to some arbitary process. So for example, here is a legitimate use of a pipe:

open (PIPE, "/usr/bin/cal|");
print while (<PIPE>);
close (PIPE);

This bit of code would give Perl access to the data produced by the cal (calendar) program. This data isn't in a static file (it is time sensitive), but it can be used as if it were. Data can be sent to pipes too. Pipes are powerful and can get complex, but all that one needs to know is that by using a pipe, the open command provides the capability to run any arbitrary command. In the previous example, it doesn't present a problem since the programer has complete control over what is being run. But be careful when the file handle being opened involves any input from the user.

Looking at the first open example again, imagine the user inputs something like this:

http://www.xed.ch?name=/sbin/shutdown%20-h|

The data the user will be reading at that point is "The system will be shut down immediately! [etc]". That's probably not what the programmer had intended.

One of the best defenses against this problem is to never use open without specifying exactly what kind of operation you intend. The open command above should be specified:

open (FH, "<$n");

Now it is clear that this is for reading. It's still not a bad idea to exclude pipes and other naughty characters from $n.

Before you completely relax and feel comfortable that this problem is easy to avoid, be warned that even with the explicit read character, <, a very, very clever user might still be able to make mischief. It turns out that Perl has so much functionality packed into the open command that one can open file descriptors and the syntax starts with <. Therefore, if the user supplies a value for $n of "&=3", file descriptor 3 gets opened for input. This is an unlikely exploit, but it reminds us that unexpected functionality lurks everywhere and people who specialize in exploiting that will know what to do with it better than non specialists.

Lesson

Check for user specified files to exist and be in order before handing them to the open command.
Unless constrained by a need to run an ancient (pre 1997) version of Perl, always use the three argument version of the open command which separates the options to the open command from the actual argument. Eliminating this ambiguity closes this hole. Much better: open(FH, "<", $n);

"system" vs. "exec" vs. "fork" vs. "qx{}" vs. "`cmd`"

Because Perl was designed to do everything that shell scripting could do, it has no shortage of ways you can mess with the host system. Perl is often called a glue language for its ability to bind other programs into a coherent system. This is useful for CGI programmers who want to put a web interface on some non Perl program. How should Perl call these underlying programs? Carefully.

The most important thing to remember is that if the user has any control over what Perl will be running (a user specified option or argument, etc) then extreme care must be taken to avoid allowing the user to slip in some magic that allows unintended processes to commence. Basically all of the bad characters are suspect when using Perl to start other processes.

As for which form of process spawner to use, there are many options. Here are the official descriptions of the ones I know about. I personally use the new qx{ } style since it seems intended to handle the normal cases where the programmer wants to run an external program.

fork Does a fork(2) system call to create a new process running the same program at the same point.

exec The "exec" function executes a system command and never returns-- use "system" instead of "exec" if you want it to return. It fails and returns false only if the command does not exist and it is executed directly instead of via your system's command shell (see below).

system Does exactly the same thing as "exec", except that a fork is done first, and the parent process waits for the child process to complete. Note that argument processing varies depending on the number of arguments.

qx{} A string which is (possibly) interpolated and then executed as a system command with "/bin/sh" or its equivalent. Shell wildcards, pipes, and redirections will be honored. The collected standard output of the command is returned; standard error is unaffected.

`cmd` Older syntax for qx{} based on shell command substitution.

With any of these commands, it's best to assume that PATH variables have been tampered with. Even if your program didn't cause that vulnerability, once an attacker has altered the default PATH, your program can be made to do bad things. Use explicit absolute addresses for each command you execute and do not rely on an automatic search through the PATH to find things. Also use Perl builtin commands where possible as opposed to Unix shell commands (grep and unlink are good examples).

# Not very secure:
print qx{grep $find $file};

# A tiny bit better:
print qx{/bin/grep $find $file};

# Best to avoid the shell completely. Much better:
open(FH,'<',$file);
/$find/ and print while <FH>;
close FH;

Lesson

As with the open command, the programmer must be extremely careful about what arguments are used for any of these execution commands.
Think about how you are quoting your command. Can the shell do crazy things with it?
The system command has a nice ability to take arguments as a list and avoid giving unitended consequences to a shell. Check that out and use it.
Use explicit absolute pathnames.
Use Perl builtin functionality when possible instead of spawning shells. Your program will probably run faster and with less resources as a bonus.
Considering using the eval command for some reason? Don't...

Taint Mode

Since enough people have been burned by Perl's tendency to be as helpful to bad guys as it is to programmers, the developers have created a special mode to help protect systems from mischief. This mode is called the taint mode and is activated by starting perl like this:

#!/usr/bin/perl -T

When taint mode is in effect, all data that originates from some external source is restricted so that it can not be used to affect anything else outside your program. For example, if you read in some user input into a variable, that variable can not be used as the name of a filename or command. It can be used in a print statement, however, since printing itself is assumed to be fairly safe from unintended consequences.

To sanitize some externally contributed data, you must employ backreferences from a regular expression. The concept is that if you spent enough effort to program a regular expression to condition this data, then you probably filtered out anything bad. Of course, you can make a mistake in this stage, but at least Perl isn't going to let some subtle and forgotten piece of data get by without some consideration.

#!/usr/bin/perl -T
$n=param('name');              # $n is tainted
$n =~ m/^([a-zA-Z1-9._]+)$/;   # $n is still tainted,...
open (FILE, "<", $1 );      #     ... but the back reference (in parens) is not
print while ();
close (FILE);

Taint Trivia: Data can be untainted by making it a key to a hash. Taint limits the contents of the default path list.

Lesson

Use taint mode! If it breaks your program, then you really should use it.
Think hard about your sanatizing regular expressions. Testing them out is never a bad idea.

Poison Null Byte

Imagine that we have a situation where we have some sensitive data (like a password or key file) which lives in the same directory as user data. We want to be able to let the user specify any file but the restricted one and then use that as a parameter to a C program that does something special with it.

$n=param('name'); 
die if ( $n eq "restricted.data" ); # Can't let the webuser use this file!
qx{/usr/local/bin/specialCprogram --data-file $n};

This program demonstrates a very specific concept, so ignore the otherwise lax security. Imagine now that the user provided a url like this:

http://www.xed.ch/cgi-bin/pnb.pl?name=restricted.data%00Noproblemshere

When Perl gets this, it is definitely not the sensitive filename. Perl can deal with the Null Character (character zero) as if it were any other. On the other hand, C/C++ uses this character as a signal that it has arrived at the end of a string. So when the C program parses the input, it will see "restricted.data" and then the null byte signal and it will think the data is finished. The file it opens up and processes would be "restricted.data". Doh!

You can clean up any input that might suffer from this problem by doing this:

$insecure =~ s/\0//;

or, this is faster (and fatal):

die if $insecure =~ tr/\0//;

Lesson

Never keep sensitive information in the same directory a user has any access to.
The poison null byte is tricky and difficult to imagine in an effective specific exploit. But that's what makes it worth keeping in mind so you don't create a perfect target for it.

Chris X. Edwards ~ January 2005

`	Command Substitution	Allows arbitrary commands to be executed.
$	Command Substitution	Variable substitution.
( )	Command Substitution	Equivalent to backtics $(command).
;	Command Separator	Allows attacker to write entire scripts in an input form box.
\|	Pipe	Redirects Output to a new arbitrary process (which doesn't even have to care about the stdin).
/	Directory Separator	Helps attacker escape from safe directory.
\	Character Escape	Evades bad char checks: echo -e "bogus\x7crm" produces "bogus\|rm".
{ }	Parameter Expansion, Brace Expansion	Lots of obscure functionality. Allows attacker to try several things at once.
&	Redirecting File Descriptors, Job Control	Changing stdout, stderr, etc.
'	Protective Quote	Used to hide special shell characters for later execution.
"	Mild Quoting	Allows for filenames/strings like "rm -r" (space included).
*	Pathname Expansion	Find out or delete the contents of a directory.
?	Pathname Expansion	Allows attacker to know less about the attacked system.
<	Redirect Input	Change source of input, create heredocs and herestrings. Used to read senstive files.
>	Redirect Output	Allows messing with filesystem, e.g. overwriting/appending critical files.
[ ]	Conditional Expressions, Pathname Expansion	Allows fishing for the existence of various files.
!	History Substitution	Allows picking through command history file. #! reveals current command script uses.
^	History Substitution	Relatively inert.
~	Tilde Expansion	Access home directories of accounts.
-	Option Specification	Although a common and reasonable character in filenames, can cause option confusion.
#	Comment, History Substitution	Hide things to get past shell. Also has about 5 other minor functions in bash.
\n	Newlines	New lines where not expected are bad.
\r	Carriage Returns	Can be confusing too.
space	Token Separator	Can mess up command lines. Cause action on multiple unitended items.
Ctl Chars	Embedded Control Characters	Cause obfuscated mischief.

Perl CGI Security Notes by Chris

Projects of Chris X Edwards

Objective

Play Along At Home

Trust Nothing

Character Reference

Traversal Vulnerabilities

"open()" Vulnerabilities

"system" vs. "exec" vs. "fork" vs. "qx{}" vs. "`cmd`"

Taint Mode

Poison Null Byte

Projects of
Chris X Edwards