comment 0

Salary

This is a taboo topic, so I’m probably not going to broadcast this one across social media.

I’m very fortunate. I earn decent money doing something I find interesting: programming. I don’t love sitting in front of a computer all day, but that’s where most of the programming happens, so I’ve learned to deal with it. I take frequent breaks, stand up, stretch, get in some push-ups, sit-ups, go outside. It’s taken me years to recognize this important balance between getting work done and maintaining your health.

Read More
comments 5

NetLinx: SNAPI

In this post, we’ll explore the Standard NetLinx API–or SNAPI, for short. This is one of those topics I didn’t fully embrace when I started programming AMX, but over time, I grew to appreciate the benefits of adhering to a standard.

I’ve updated the touchpanel layout in this post, so if you want to grab the latest code, it’s available on GitHub.

Read More
comment 0

Analyzing Web Traffic

I have another website at dev.kielthecoder.com that isn’t used for much. But I was curious who might be visiting it and where they come from. I’m not doing any cookies or session tracking, so I only have the server log files to go off of. I want to demonstrate some UNIX commands that can be used to gather information.

Access Logs

NGINX stores it’s access logs in a very common format. By default, it looks like this:

$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes $referer $user_agent

There’s also a fair amount of log retention. I have 15 access logs saved (2 – 14 are also compressed). It looks like they rotate every day. If I look at yesterday’s log file I can see (I’m masking the IP addresses since they aren’t from me):

$ head -1 /var/log/nginx/access.log.1
x.x.x.x - - [30/Jun/2021:00:05:44 -0400] "GET / HTTP/1.1" 301 185 "-" "-"

$ tail -1 /var/log/nginx/access.log.1
x.x.x.x - - [30/Jun/2021:23:58:11 -0400] "HEAD /epa/scripts/win/nsepa_setup.exe HTTP/1.1" 404 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36"

We can use the head command to view just the first line of yesterday’s file (access.log.1) and tail to view just the last line. If I want to look at older, compressed logs, I can pass it through zcat first like this:

$ zcat /var/log/nginx/access.log.14.gz | head -5
x.x.x.x - - [17/Jun/2021:00:03:57 -0400] "GET /plugin.php?id=xhuaian_makefriends:main&id=xhuaian_makefriends:main&id=xhuaian_makefriends:main&id=xhuaian_makefriends:main&id=xhuaian_makefriends:main&id=xhuaian_makefriends:main&id=xhuaian_makefriends:main&page=13 HTTP/1.1" 301 185 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

x.x.x.x - - [17/Jun/2021:00:03:58 -0400] "GET /plugin.php?id=xhuaian_makefriends:main&id=xhuaian_makefriends:main&id=xhuaian_makefriends:main&id=xhuaian_makefriends:main&id=xhuaian_makefriends:main&id=xhuaian_makefriends:main&id=xhuaian_makefriends:main&page=13 HTTP/1.1" 404 143 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

x.x.x.x - - [17/Jun/2021:00:06:20 -0400] "GET /robots.txt HTTP/1.1" 301 185 "-" "Mozilla/5.0 (compatible;PetalBot;+https://webmaster.petalsearch.com/site/petalbot)"

x.x.x.x - - [17/Jun/2021:00:06:21 -0400] "GET /robots.txt HTTP/1.1" 404 143 "-" "Mozilla/5.0 (compatible;PetalBot;+https://webmaster.petalsearch.com/site/petalbot)"

x.x.x.x - - [17/Jun/2021:00:14:13 -0400] "GET /robots.txt HTTP/1.1" 301 185 "-" "Mozilla/5.0 (compatible;PetalBot;+https://webmaster.petalsearch.com/site/petalbot)"

Bots! Hey, if you’ve got a public web server, you’re going to get hit by lots of bots. The first two entries look like a bot trying to do something malicious with WordPress (Sorry, evil bot! No WordPress installed). The next three entries look like a well-behaved bot simply crawling my site.

Now that we can see what each line looks like, what can we do with them?

Most Requested URL

What if we want to figure out which URL is the most requested? This is where we can make good use of languages like Perl or Awk that specialize in working with text. Let’s start with this small program and name it urls.pl. It will tell us who is looking for robots.txt:

#!/usr/bin/perl

while (<>) {
   chomp;
   if (/robots.txt/) {
      print "$_\n";
   }
}

This program reads from standard input and checks each line for the string robots.txt. Make it executable (chmod +x urls.pl) then we can see some of the visiting bots by typing:

$ zcat /var/log/nginx/access.log.2.gz | ./urls.pl
18 results (mostly Google and Bing)

I want to use regular expressions to pull out each component on each line of the log file. I found online tools like regex101 really useful to write and debug the matching rules:

#!/usr/bin/perl

while (<>) {
    chomp;
    my ($remote_addr, $ident, $remote_user, $datetime,
        $request, $status, $body_bytes, $referer, $user_agent) =
        /^(\S+) (\S+) (\S+) \[([^]]+)\] "(.*)" (\d+) (\d+) "(.+)" "(.+)"$/;

    print $remote_addr, "|", $datetime, "|", $request, "|", $status, "|",
        $referer, "|", $user_agent, "\n";
}

This will assign each of the matches to a variable, like $request. Now we get results like this:

$ zcat /var/log/nginx/access.log.4.gz | ./urls.pl | head -5
x.x.x.x|27/Jun/2021:00:24:22 -0400|GET / HTTP/1.1|301|-|python-requests/2.24.0

x.x.x.x|27/Jun/2021:00:30:19 -0400|GET / HTTP/1.1|200|-|Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.90 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

x.x.x.x|27/Jun/2021:00:38:24 -0400|POST /boaform/admin/formLogin HTTP/1.1|301|http://45.79.94.20:80/admin/login.asp|Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:71.0) Gecko/20100101 Firefox/71.0

x.x.x.x|27/Jun/2021:00:38:24 -0400||400|-|-

x.x.x.x|27/Jun/2021:00:57:54 -0400|POST /boaform/admin/formLogin HTTP/1.1|301|http://45.79.94.20:80/admin/login.asp|Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:71.0) Gecko/20100101 Firefox/71.0

Just constantly getting hammered by bots! Instead of printing out each line, lets keep track of each unique URL by stuffing it into a hash. Then at the end, we can sort based on the weight for each one:

#!/usr/bin/perl

my %urls;

while (<>) {
    chomp;
    my ($remote_addr, $ident, $remote_user, $datetime,
        $request, $status, $body_bytes, $referer, $user_agent) =
        /^(\S+) (\S+) (\S+) \[([^]]+)\] "(.*)" (\d+) (\d+) "(.+)" "(.+)"$/;

    $urls{$request}++;
}

my @keys = sort { $urls{$b} <=> $urls{$a} } keys(%urls);
for (@keys) {
    printf "%4d  %s\n", $urls{$_}, $_;
}

Each URL request is tallied, then we sort by which has the most requests. Now, we can ask for the top 10 requests from a couple days ago:

$ zcat /var/log/nginx/access.log.3.gz | ./urls.pl | head -10
  59  GET / HTTP/1.1
  14  GET /.env HTTP/1.1
  13  POST / HTTP/1.1
  11  GET /vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php HTTP/1.1
  11  GET /resume.html HTTP/1.1
  10  GET /robots.txt HTTP/1.1
   8  GET /favicon.ico HTTP/1.1
   8
   6  GET /assets/img/facebook.png HTTP/1.1
   6  GET /assets/css/monokai.css HTTP/1.1

Nice! OK, what if we want to get the top 10 across all the saved access logs? First, I’m going to dump them all into a file (that I’ll remove later):

$ cat /var/log/nginx/access.log{,.1} > access-temp
$ zcat /var/log/nginx/access.log.{2..14}.gz >> access-temp
$ wc -l access-temp
5359 access-temp

5,359 log entries to sort through?! Can our little script handle it? What are the top 10 most popular URLs going to be?

$ ./urls.pl < access-temp | head -10
1209  GET /phpmyadmin/ HTTP/1.1
1079  GET / HTTP/1.1
 206  GET /robots.txt HTTP/1.1
 178  GET /.env HTTP/1.1
 153  POST / HTTP/1.1
  96  GET /vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php HTTP/1.1
  88  GET /resume.html HTTP/1.1
  71  GET /favicon.ico HTTP/1.1
  61
  57  GET /_ignition/execute-solution HTTP/1.1

I’m not surprised someone trying to attack phpMyAdmin is number 1, but I’m happy to see my resume still makes the top 10!

Most Visits

I’m also curious to know who visits my site the most (besides me). I want to use a similar regular expression to break apart each log entry, but this time I want to count how many requests they’ve made and which pages were requested the most. Let’s use C# this time:

using System;
using System.Collections.Generic;
using System.IO;
using System.Text.RegularExpressions;

namespace CountVisits
{
   class Program
   {
      static void Main(string[] args)
      {
         var re = new Regex(@"(\S+) (\S+) (\S+) \[([^]]+)\] ""(.*)"" (\d+) (\d+) ""(.+)"" ""(.+)""");
			
         var visits = new Dictionary<string, int>();
         var urls = new Dictionary<string, Dictionary<string, int>>();

         foreach (var arg in args)
         {
            if (File.Exists(arg))
            {
               using (var reader = new StreamReader(File.OpenRead(arg)))
               {
                  while (!reader.EndOfStream)
                  {
                     var text = reader.ReadLine();
                     var matches = re.Matches(text);

                     if (matches.Count > 0)
                     {
                        var remoteAddress = matches[0].Groups[1].Value;
                        var identity = matches[0].Groups[2].Value;
                        var remoteUser = matches[0].Groups[3].Value;
                        var dateTime = matches[0].Groups[4].Value;
                        var request = matches[0].Groups[5].Value;
                        var status = matches[0].Groups[6].Value;
                        var bodyBytes = matches[0].Groups[7].Value;
                        var referer = matches[0].Groups[8].Value;
                        var userAgent = matches[0].Groups[9].Value;

                        if (!visits.ContainsKey(remoteAddress))
                        {
                           visits.Add(remoteAddress, 0);
						}

                        visits[remoteAddress]++;

                        if (!urls.ContainsKey(remoteAddress))
						{
                           urls[remoteAddress] = new Dictionary<string, int>();
                        }

                        if (!urls[remoteAddress].ContainsKey(request))
                        {
						   urls[remoteAddress].Add(request, 0);
                        }

						urls[remoteAddress][request]++;
                     }
                  }
               }

               var sortedVisits = new List<KeyValuePair<string, int>>(visits);
               sortedVisits.Sort((KeyValuePair<string, int> a, KeyValuePair<string, int> b) => b.Value.CompareTo(a.Value));

               for (int i = 0; i < 20; i++)
               {
                  Console.WriteLine("{0} {1}", sortedVisits[i].Value, sortedVisits[i].Key);

                  var sortedUrls = new List<KeyValuePair<string, int>>(urls[sortedVisits[i].Key]);
                  sortedUrls.Sort((KeyValuePair<string, int> a, KeyValuePair<string, int> b) => b.Value.CompareTo(a.Value));

                  for (int j = 0; j < sortedUrls.Count; j++)
                  {
                     if (j == 5)
					    break;

                     Console.WriteLine("\t{0} {1}", sortedUrls[j].Value, sortedUrls[j].Key);
                  }
               }
            }
            else
            {
               Console.WriteLine("File does not exist: {0}", arg);
            }
         }
      }
   }
}

You can see that–compared to Perl–the C# code is a bit longer but is still pretty compact. Most of the work is being handled by the regular expression matching on line 26. After that, we’re just counting occurrences. When we print it out, I limit it to anyone with more than 5 visits, then I print out their top requested URLs in order. Here are the top 5 visitors (mostly bots):

$ dotnet run -- ../access-temp
1205 116.1.201.38
   1205 GET /phpmyadmin/ HTTP/1.1
591 45.146.165.123
   72 GET /vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php HTTP/1.1
   55 GET /wp-content/plugins/wp-file-manager/readme.txt HTTP/1.1
   54 GET /index.php?s=/Index/\x5Cthink\x5Capp/invokefunction&function=call_user_func_array&vars[0]=md5&vars[1][]=HelloThinkPHP21 HTTP/1.1
   54 GET /?XDEBUG_SESSION_START=phpstorm HTTP/1.1
   53 GET /console/ HTTP/1.1
255 119.29.99.56
   1 GET /robots.txt HTTP/1.1
   1 GET /Admin/Common/HelpLinks.xml HTTP/1.1
   1 GET /API/DW/Dwplugin/TemplateManage/login_site.htm HTTP/1.1
   1 GET /API/DW/Dwplugin/SystemLabel/SiteConfig.htm HTTP/1.1
   1 GET /Admin/Login.aspx HTTP/1.1
189 75.67.4.135
   19 GET /favicon.ico HTTP/1.1
   15 GET / HTTP/1.1
   14 GET /assets/css/styles.css HTTP/1.1
   10 GET /assets/css/monokai.css HTTP/1.1
   9 GET /assets/js/app.js HTTP/1.1
59 77.46.59.28
   10 GET / HTTP/1.1
   4
   2 GET //site/wp-includes/wlwmanifest.xml HTTP/1.1
   2 GET //wp2/wp-includes/wlwmanifest.xml HTTP/1.1
   2 GET //test/wp-includes/wlwmanifest.xml HTTP/1.1

Better Analytics

If we wanted to dig deeper into who’s visiting our pages and why, it would probably require storing cookies on the visitor’s machine. WordPress gives me tons of analytics about who visits this blog, but that’s because it probably uses cookies and a full database to track users. My static website doesn’t have–or need–those things.

comment 0

Office Treasure!

I love going into our office and exploring our kitchen / break room / storage closet when I’m waiting for my coffee to brew. There are so many strange things we’ve held onto for who knows how long. Like today, I just noticed this nice Pelican case sitting behind the refrigerator:

What treasure could we be hiding in such a nice case??

A Sanyo XGA projector?! It’s a $400 case for a $40 projector! I love it! I wonder how long this thing will sit next to the fridge… We should really clean out a lot of this junk we’ve been hoarding.

comment 0

Org Capture

I’ve been getting better at using Org Capture to keep a work journal (as in, actually remembering to write in it). It was especially useful during Crestron Masters to keep everything organized. Org is a very powerful tool, and I’ve found that working with the parts that are easy to understand, it’s easy to slowly build from there. For me, that’s logging journal entries.

I stumbled onto Org by reading Sacha Chua‘s blog. She has an excellent post on taking notes more efficiently in Org. I’ve set myself up with 3 files to organize my thoughts:

  • journal.org – This is where I jot down quick ideas or reflect on something. I typically post links to tasks here as I’m working on them.
  • tasks.org – This is where I track progress on project work. Anything that could potentially have a TODO placed next to it goes here, then I link it into my journal so I remember the context for adding it.
  • notes.org – This is where I organize study notes on different subjects. It’s also where I’ll write down passwords or secrets if I’m using any type of 3rd party service.

So far this seems to be working well. It would be great if I could write everything in Org and export as needed. This might work well with my developer site, but I don’t know how I would be able to pull in posts and format them correctly in WordPress. There might be plugins that can handle that?

Org has helped me write more (one of my goals for 2020). I’ve been writing in my journal a few times every day since April 16th, 2021, and I’m already up to 16K words (plus another 7K in my notes). Org lets me tag entries too so I can quickly return to them. For example, I put masters on everything from Crestron Masters this year:

C-c / m lets you find tagged entries.

So if you’re looking for a good note-taking system, I’d recommend Org Mode in Emacs.

comments 5

NetLinx: Getting Started

I want to write a few posts about programming AMX NetLinx controllers. While I started my career programming AMX systems, I’m lucky now if I see 1 or 2 in a year. Strangely, every time I’ve started with (or returned to) a company, I’ve been handed a NetLinx system to figure out. It’s sort of a welcoming return to AV because programming in the NetLinx language is well-suited to automation tasks and something about it brings me joy.

Read More
comment 0

Mechanical Keyboard

I just got a new Logitech K840 Mechanical Keyboard because I noticed some of the keys were sticking on my old Microsoft one. Granted it was 8 years old but still worked reasonably well. So far I’m happy with this new one, it definitely feels and sounds different, sort of like the first PC keyboard I can remember. I like that! It doesn’t have Num-Lock or Scroll-Lock indicators, so that’s kind of weird, but those aren’t buttons I hit very often anyway I guess. I’ll probably post an update after I’ve used it for a while.