If you can see this check that

Main Page

Logfile Analysis


Basic File Handling

User:
Password:

Searching through files and logs

To reset all the check buttons from a previous attempt click here

Question 1: Download a log

Do these questions as user "root", and save all files in root's home directory, /root."

Use the command wget to download one of my server's log files. You need to do:

wget http://linuxzoo.net/data/short.log -O log
This downloads one of my weblogs and saves it into a file called "log".

Tests - not attempted
Downloaded ok UNTESTED

Question 2: Any 404

Look for file not found errors in this weblog. This is error 404. Although not a perfect method, you can do this by searching for " 404 " in the log. The spaces are important, otherwise a search for "404" would match "404hello" etc.

Once found save all of those to a file called "notfound".

Tests - not attempted
Notfound lines UNTESTED

Question 3: The IP numbers

Process the "notfound" file and save a list of only the IP numbers of each log entry. This can be done using

cut -f1 -d" " filename
This gives the first "field" of the file "filename", where fields are delimited using the space " " character. Save that info to a file called "ip".

Tests - not attempted
Just IPs UNTESTED

Question 4: Duplicates

If you "cat ip" you will see that there are many duplicate IPs shown. The "sort -u filename" command will sort uniquely that file and remove duplicates. It also sorts the entry alpha numerically. What is the last IP printed if this uniqueness processing is applied to the ip file?
Last Unique IP:

Tests - not attempted
Last unique 404 ip UNTESTED

Question 5: How many times

How many times does the above IP exist in the full log file "log"?
Count of Last Unique IP:

Tests - not attempted
Frequency of IP UNTESTED

Question 6: Duplicates

Save a list of all the unique IP numbers from log into a file called uip.

Tests - not attempted
Unique file UNTESTED

Question 7: Frequency

For each line of uip, say how many times that line occurs in log. The output should just be a single number on each line, indicating the count. Save the answer in freq. Use xargs, but make sure in your grep that the IP does not match something shorter, e.g. a line with 10.0.0.1 doesnt match a line of 10.0.0.11 (ignore the issue that the dots are unescaped). Only save the biggest 10 counts, ordered in decending order

Tests - not attempted
Frequency file UNTESTED

Question 8: Frequency

The problem with the freq file is that you cannot tell which IP has what frequency. Fix that by redoing the above question and making sure the output is in the format of ip,count. For example, "10.0.0.1,5". Save the answer to freq2

Hint: the xargs needs to be in the format:

sh -c "echo -n {},;grep -c '{} ' log"
So this executes a new shell, and in it does an echo (print) of the parameter, then a comma, and no return character. The semicolon marks the end of the command and the start of the next. Here do the normal grep.

Tests - not attempted
Frequency file UNTESTED


Centos 7 intro: Paths | BasicShell | Search
Linux tutorials: intro1 intro2 wildcard permission pipe vi essential admin net SELinux1 SELinux2 fwall DNS diag Apache1 Apache2 log Mail
Caine 10.0: Essentials | Basic | Search | Acquisition | SysIntro | grep | MBR | GPT | FAT | NTFS | FRMeta | FRTools | Browser | Mock Exam |
Caine 13.0: Essentials | Basic | Search | Acquisition | SysIntro | grep | MBR | GPT | FAT | NTFS | FRMeta | FRTools | Browser | Registry | Mock Exam |
CPD: Cygwin | Paths | Files and head/tail | Find and regex | Sort | Log Analysis
Kali: 1a | 1b | 1c | 2 | 3 | 4a | 4b | 5 | 6 | 7a | 8a | 8b | 9 | 10 |
Kali 2020-4: 1a | 1b | 1c | 2 | 3 | 4a | 4b | 5 | 6 | 7 | 8a | 8b | 9 | 10 |
Kali 2024-4: 1a | 1b | 1c | 2 | 3 | 4a | 4b | 5 | 6 | 7 | 8a | 8b | 9 | 10 |
Useful: Quiz | Privacy Policy | Terms and Conditions

Linuxzoo created by Gordon Russell.
@ Copyright 2004-2025 Edinburgh Napier University