Logfile Analysis

Basic File Handling


Searching through files and logs

Question 1: Download a log

Do these questions as user "root", and save all files in root's home directory, /root."

Use the command wget to download one of my server's log files. You need to do:

wget http://linuxzoo.net/data/short.log -O log
This downloads one of my weblogs and saves it into a file called "log".

Question 2: Any 404

Look for file not found errors in this weblog. This is error 404. Although not a perfect method, you can do this by searching for " 404 " in the log. The spaces are important, otherwise a search for "404" would match "404hello" etc.

Once found save all of those to a file called "notfound".

Question 3: The IP numbers

Process the "notfound" file and save a list of only the IP numbers of each log entry. This can be done using

cut -f1 -d" " filename
This gives the first "field" of the file "filename", where fields are delimited using the space " " character. Save that info to a file called "ip".

Question 4: Duplicates

If you "cat ip" you will see that there are many duplicate IPs shown. The "sort -u filename" command will sort uniquely that file and remove duplicates. It also sorts the entry alpha numerically. What is the last IP printed if this uniqueness processing is applied to the ip file?
Last Unique IP:

Question 5: How many times

How many times does the above IP exist in the full log file "log"?
Count of Last Unique IP:

Question 6: Duplicates

Save a list of all the unique IP numbers from log into a file called uip.

Question 7: Frequency

For each line of uip, say how many times that line occurs in log. The output should just be a single number on each line, indicating the count. Save the answer in freq. Use xargs, but make sure in your grep that the IP does not match something shorter, e.g. a line with doesnt match a line of (ignore the issue that the dots are unescaped). Only save the biggest 10 counts, ordered in decending order

Question 8: Frequency

The problem with the freq file is that you cannot tell which IP has what frequency. Fix that by redoing the above question and making sure the output is in the format of ip,count. For example, ",5". Save the answer to freq2

Hint: the xargs needs to be in the format:

sh -c "echo -n {},;grep -c '{} ' log"
So this executes a new shell, and in it does an echo (print) of the parameter, then a comma, and no return character. The semicolon marks the end of the command and the start of the next. Here do the normal grep.

