Find: The Mysteries Explained

July 3, 1997

This page will discuss how to do two elementary things which are important parts of code development:

  1. Find the source of a given subroutine, function or include file.
  2. Find every occurence in the entire source of a particular character string.
The discussion is given in the context of MCFast but much of it is applicable to code development any unix machine.

The scripts mentioned below are found on fnsimu1 in $MCFAST_DIR/bin.

  1. Finding Include Files.


    The script mcfind_inc will search all of the source trees used by MCFast and evgen to find the include file which is specified as a command line argument. For example,
    mcfind_inc stdhep

    As in all of these examples, do NOT specify a file type (.inc).

    In the system adopted by MCFast, included files are resolved in a preprocessing step which is run before compilation ( the cpp step). To see what directories are searched for include files at cpp time, look at the following lines in your GNUmakefile: INC1=, INC2= etc.

    There is one important caveat. If you are looking for QQ include files, you should take them from $QQ_DIR/src/inc, not from any other place in the QQ_DIR heirarchy. This caveat is the main reason that there are separate commands to look for include files and other files: the search for the include files covers a more restricted path.

    The script is quite short. It uses the unix "find" command, for which there is a tutorial below.

  2. Finding Subroutine and Function Source Files, Part 1.

    In most cases MCFast code is stored one subroutine or function per file and the file has the name of the subroutine or function, usually with an extension of .F or .c. For such cases, a script, similar to the one discussed above, has been provided to find a given source file. For example,
    mcfind_src trk_trace_param
    mcfind_src qqdeca

    If the routine is not found by this procedure, there are two likely possibilities. One is that the routine comes from some system library or from the CERN libraries; this can be checked by looking at the map file. The second possibility is discussed below.

  3. Finding Subroutine and Function Source Files, Part 2.

    In some cases, most notably with code written in c, there are several entry points in one file. In order to search for the source for these routines, one needs to examine every file in the source tree and to search it for the name of the requested function. A script is provided to do this, for example,
    mcsearch_src trk_trace_param

    This command will be slower than the others. The output of this command is a list of all lines in the source tree which contain the requested character string and the names of the files in which the output was found. In each case the name of the file FOLLOWS the listing of matched lines. It should be clear from the listing where the function is defined and where it is referenced. This script only searches .F and .c files.

  4. Finding Where a variable is given a value.

    The solution to this problem requires using the unix utilities find and grep in their native modes. For example, the following command will show all uses of the variable rpln_par(*).radius, where (*) represents any index:

    find $MCFAST_DIR/simulator \*.F -exec grep "rpln_par.*radius" {} \; -print

    It is usually straight forward to scan such a listing to find out where the variable is defined and where it is used. The tutorial below explains each of the pieces of this example. Of course this method will fail if the variable name is split over two lines.

Tutorial on using find and grep

  1. Descend a source tree and print every file:
    % find $MCFAST_DIR -print | less
    The find command returns the name of every file which is descended from $MCFAST_DIR; one can think of the options to find as a little program which says what to do with the name of each of those filenames. In the present example the little program just says to print them out. On some machines print is a default on others it is not.

    The "| less" is just a suggestion. If you wish to see 5 minutes of printout fly past, you may remove it. To get out of the paged listing, type "q".

  2. Descend a source tree and print the name of every .F file:
    % find $MCFAST_DIR -name \*.F -print
    The \ "escapes" the *. That is it tells the unix shell, please do not interpret the next character as a special character; instead just pass it to the find command. ( Aside: when the \ is the last character on a line it "escapes" the newline character and is, therefore, effectively a continuation character. Most people seem to think of \ as the continuation character - one should think of it as an "escape" character.)

  3. Or one can use quotes to escape the *:
    % find $MCFAST_DIR -name "*.F" -print

  4. Descend a source tree and print the name of every .F and .c file:
    % find $MCFAST_DIR \( -name "*.F" -o -name "*.c" \) -print | less
    The () delimit the compound primary request. The \ escape the () or else the shell gets confused. The -o is a logical OR operator.

  5. Descend a directory tree and print the names of any .F or .c files which contain the character string "string":

    % find $MCFAST_DIR \( -name "*.F" -o -name "*.c" \) \ -exec grep -i string {} \; -print | less

    The lonely \ at the end of the first line is just the "continuation character" described above; it has been included here only formatting reasons and, if your window is wide enough you may type the entire command on one line. The -exec says to take the clause delimited by the semicolon and to execute it for every file which passes the previous criteria. Note that the semicolon must be escaped or else the shell will decide that the find command terminates at the semicolon. ( A feature of many shells is that one can issue several commands, separated by semicolons, on one line. This is important inside Makefiles but I have not seen it widely used within the HEP community. ) The grep command is like the search command in VMS - it prints out any lines which contain the specified string. The {} operator says to insert the current file name at that point in the command; therefore the grep command operates on the current file. The -i says to ignore case when matching letters. The -print operator only works if the output of the -exec is a successful status code, that is, if it found at least one line containing the string. For example,
    % find $MCFAST_DIR/simulator \( -name "*.F" -o -name "*.c" \) \
       -exec grep -i trk_trace_param {} \; -print
          integer function trk_trace_param(hep)
     520  trk_trace_param = 0
    9998  trk_trace_param = 1
    5000  format('trk_trace_param: event ', i6, ' at track ', i5,'.',
    9999  trk_trace_param = 2
    5001  format('trk_trace_param: Tracing aborted during event ', i6,
    c $Id$
    c $Log$
    c Revision 1.1  2000/06/19 19:59:11  eugenio
    c Initial revision
    c
    c Revision 1.2  1998/07/03 23:16:01  kutschke
    c Updates for new cvs structure.
    c
    /home/sim1/bphyslib/release/dev/simulator/track/src/trk_trace_param.F
          external trk_trace_param
          integer trk_trace_param
         & trk_trace_param')
          status = trk_trace_param(hep)
    /home/sim1/bphyslib/release/dev/simulator/track/src/trk_trace_scat.F
          external trk_trace_param, trk_trace_scat
          integer  trk_trace_param, trk_trace_scat
               status = trk_trace_param(hep)   !trace particles through detectors
    /home/sim1/bphyslib/release/dev/simulator/user/src/usr_before_trigger.F
    
    
    Notice that the filename is printed after the text from that file. From the above one can see that the routine trk_trace_param is defined in the file:
    /home/sim1/bphyslib/release/dev/simulator/track/src/trk_trace_param.F and that it is called in two places.

  6. Finally, the example mentioned above:
    %find $MCFAST_DIR/simulator \*.F -exec grep "rpln_par.*radius" {} \; -print
    In this case the new feature is the ".*" in the middle of the search pattern ( in grep speak a search pattern is called called a regular expression ). The "." says match any character. The "*" says match the previous character zero or more times. Therefore the above will match any character string which starts with "rpln_par" and which ends with "radius" and which has any number of characters, including none, in between.


[back] [fermi at work] [simulation home]
Rob Kutschke kutschke@fnal.gov

Lynn Garren garren@fnal.gov