Improving my NAS monitoring

I've decided to improve my NAS monitoring scripts a bit. You know, stuff you usually do on a Saturday.

Improving my NAS monitoring

In previous posts (this and this) I've played a bit with what could I do on my NAS by reverse engineering some of its binaries. However, something I wanted to do for a while was to improve my current poor monitoring script. The way it works today is this:

  1. Fetch one specific information
  2. Publish it using MQTT on Docker
  3. Fetch another specific information
  4. Publish it using MQTT on Docker
  5. (you can see where this is going, right?)

However, this is slow and annoying, as publishing it requires a Docker container to be created and a message to be sent. This is very, very slow and sometimes messes up the OS' shutdown procedure, which requires Docker to be stopped. This is a lesson I've learned the hard way after days debugging everything:

Turns out you can't really stop Docker if you keep restarting it every minute to spin up a new container. Oops!

If someone is interested, this is how it works right now:

#!/opt/bin/bash

MQTT_HOST=something
MQTT_USER=somethingelse
MQTT_PASS=somethingelseagain
MQTT_PREFIX=vault

## ========================

# HACK: check for docker iptables rules. This avoids running the script if Docker is shutting down.
RULE_COUNT=$(iptables -L DOCKER 2>/dev/null | wc -l)
if [[ "$RULE_COUNT" -lt 3 ]]; then
  echo "Docker rules are probably missing, aborting script!"
  exit 0
fi

# This function will publish our data.
function publish {
  echo "Publishing $MQTT_PREFIX/$1 -> $2"
  /usr/local/bin/docker run --rm eclipse-mosquitto \
    mosquitto_pub -u $MQTT_USER -P $MQTT_PASS -h $MQTT_HOST -t $MQTT_PREFIX/$1 -m $2
}

# Get the JSON information from sysinfo.cgi, which we studied before.
JSON=$(sh ./fakecgi.sh)

# Extract CPU temperature
CPUTEMP=$(echo $JSON | /opt/bin/jq -r ".cputemp")
publish cpu/temp $CPUTEMP

# Extract system temperature
SYSTEMP=$(echo $JSON | /opt/bin/jq -r ".systemp")
publish sys/temp $SYSTEMP

# Extract fan 0 speed
FAN0SPEED=$(echo $JSON | /opt/bin/jq -r ".fan_speed[0]")
publish fan0/speed $FAN0SPEED

# Extract each disk temperature
DISKS=$(ls /dev/sd* | grep -E "sd[a-z]$" | grep -v sde)
i=1
for DISK in $DISKS; do
  HDDTEMP=$(smartctl -A $DISK | grep Temperature_Celsius | cut -d"-" -f2 | xargs | cut -d" " -f1)
  publish hdd$i/temp $HDDTEMP
  i=$((i+1))
done

So, let's begin by listing our goals:

  1. Be able to execute the OS' CGI binaries without modifying any library like we did before. This allows the code to be more update-resilient.
  2. Use the original CGI binaries to extract information instead of doing it myself. I mean, why not, right? Since Asustor had the trouble to write the code, why not use it?
  3. Maybe run this from within a single, always-running Docker container. Either this or remove Docker from the monitoring script (ie. install the MQTT client on the NAS itself).

First goal: hijack CGI binaries!

So, let's have some more fun with sysinfo.cgi. Firing it up on Ghidra and following the entry, we can learn a bit of how it works. First of all, it requires an environment variable called QUERY_STRING, which must set a few parameters it'll later use: sid (user token), act (action), _dc (a timestamp, but probably only for caching?). Anyway, here's the code:

int FUN_00401e20(void)
{
  int retval2;
  long retval1;
  char *param_value;
  O_CGI_AGENT cgiagent [224];
  
  O_CGI_AGENT::O_CGI_AGENT(cgiagent,0);
                    /* try { // try from 00401e35 to 00401e5f has its CatchHandler @ 0040203b */
  retval2 = O_CGI_AGENT::Revive_Session();
  if (retval2 == 0) {
                    /* try { // try from 00401e7f to 00402033 has its CatchHandler @ 0040203b */
    retval1 = O_CGI_AGENT::Get_Param(cgiagent,"act");
    if (retval1 != 0) {
      param_value = (char *)O_CGI_AGENT::Get_Param(cgiagent,"act");
      if (param_value != (char *)0x0) {
        retval2 = strcmp(param_value,"sys");
        if (retval2 == 0) {
          O_CGI_AGENT::Print_Content_Type_Header();
          retval2 = FUN_00402160(cgiagent);
          goto LAB_00401e62;
        }
      }
      param_value = (char *)O_CGI_AGENT::Get_Param(cgiagent,"act");
      if (param_value != (char *)0x0) {
        retval2 = strcmp(param_value,"net");
        if (retval2 == 0) {
          O_CGI_AGENT::Print_Content_Type_Header();
          retval2 = FUN_00402750(cgiagent);
          goto LAB_00401e62;
        }
      }
      param_value = (char *)O_CGI_AGENT::Get_Param(cgiagent,"act");
      if (param_value != (char *)0x0) {
        retval2 = strcmp(param_value,"wan");
        if (retval2 == 0) {
          O_CGI_AGENT::Print_Content_Type_Header();
          retval2 = FUN_00402930(cgiagent);
          goto LAB_00401e62;
        }
      }
      param_value = (char *)O_CGI_AGENT::Get_Param(cgiagent,"act");
      if (param_value != (char *)0x0) {
        retval2 = strcmp(param_value,"doctor");
        if (retval2 == 0) {
          O_CGI_AGENT::Print_Content_Type_Header();
          retval2 = FUN_00403810(cgiagent);
          goto LAB_00401e62;
        }
      }
      param_value = (char *)O_CGI_AGENT::Get_Param(cgiagent,"act");
      if (param_value != (char *)0x0) {
        retval2 = strcmp(param_value,"initial");
        if (retval2 == 0) {
          O_CGI_AGENT::Print_Content_Type_Header();
          retval2 = FUN_004020f0(cgiagent);
          goto LAB_00401e62;
        }
      }
      param_value = (char *)O_CGI_AGENT::Get_Param(cgiagent,"act");
      if (param_value != (char *)0x0) {
        retval2 = strcmp(param_value,"collect");
        if (retval2 == 0) {
          retval2 = FUN_004039c0(cgiagent);
          goto LAB_00401e62;
        }
      }
    }
    O_CGI_AGENT::Print_Content_Type_Header();
    puts("{ \"success\": false, \"error_code\": 5301 }");
  }
  else {
    O_CGI_AGENT::Print_Content_Type_Header();
    if (retval2 == -0x100) {
      puts("{ \"success\": false, \"error_code\": 5053 }");
    }
    else {
      puts("{ \"success\": false, \"error_code\": 5000 }");
    }
  }
  retval2 = 0;
LAB_00401e62:
  O_CGI_AGENT::~O_CGI_AGENT(cgiagent);
  return retval2;
}

Honestly, it took me a while to figure this out. Ghidra had a hard time figuring some stuff out as the CGI library is C++, so method calls on those instances is very weird. Anyway, the way it works is very simple:

  1. Revive the user session (remember the sid?)
  2. (Probably) make sure we have an action to do (the act parameter)
  3. Tests which action we want to perform and navigates to that function

We previously hijacked this code by replacing on its binary the path of the session file. Let's take a different approach here and hijack the Revive_Session method. The way we do this is very simple: we build a library that exposes such function and point LD_PRELOAD to it, which will force our custom lib to be loaded.

Easy, right? :) [insert here the happy-crying emoji]

However, there's a problem. The Revive_Session is a method on the O_CGI_AGENT class, not a global function. This changes things a bit. But even though this is a method, it must be exported as a function somehow. Let's take a look at all strings on the libcgi.so containing the method name. We can also unmangle the function names with c++filt:

# for fun in $(strings /mnt/lib/libcgi.so | grep Revive_Session); do echo $fun $(c++filt $fun); done
_ZN11O_CGI_AGENT14Revive_SessionEi O_CGI_AGENT::Revive_Session(int)
_ZN11O_CGI_AGENT14Revive_SessionEv O_CGI_AGENT::Revive_Session()
#

The one we want is the first one - or at least that is the one I would guess we want. You see, once compiled, there's no such thing as classes and methods: everything is just straight on code. Based on my experience debugging C++ code (so many traumas), the way you (usually) call a method on a class is by simply passing its instance as the first argument. The method code is not replicated on each instance (plus normally you wouldn't be able to run such code as it's program data and not code, but that's a whole different story), but instead is a single instance where you pass the instance's pointer as the first parameter. The int there could be our pointer, so we might as well hijack that function. The code is very simple:

#include <stdlib.h>
#include <stdio.h>

int _ZN11O_CGI_AGENT14Revive_SessionEi(void *this) {
  return 0;
}
We most likely don't need those includes, but I forgot to remove them.

This will override the Revive_Session method and return 0 instead, which is expected by the CGI code for the session to be accepted. Building it is a simple gcc hijack.c -o hijack.so -shared away. But does it work?

# QUERY_STRING="act=sys" LD_PRELOAD=../ubuntu/hijack.so /volume0/usr/builtin/webman/portal/apis/information/sysinfo.cgi
Content-type: text/plain; charset=utf-8

{ "success": true,"model":"AS3104T", "cpu":"Intel® Celeron™ CPU @ 1.60GHz", "mem":"2048", "serial":"(...)", "os":"3.5.3.RBH1", "timezone":"(GMT-03:00) Brasilia", "time":"1615658158", "year":"2021", "month":"3", "day":"13", "hour":"14", "minute":"55", "DateFormat":"1", "TimeFormat":"1", "uptime":"577427", "cputemp":"61", "bios":"2.23", "bootloader":"", "fan_speed": ["524"], "systemp":"40","aid":""}

Yes, it does! Awesome, we just hijacked the session control! The first goal is done, we can now call the sysinfo CGI binary as much as we want without modifying any library at all!🎉

You might be wondering how is this any different from before. Previously we modified the libcgi.so itself to make this work, by making it point to a different sessions file. This change, however, keeps the original library, which is probably a bit more resistent to OS updates. Also, it doesn't require creating a temporary fake file to make it work. Finally, this is a bit easier to run, as it only requires 2 environment variables. So, it's worth it!

Second goal: using the existing CGI binaries to extract all data

Our monitoring script looks at three main things: system temperatures, fan speeds and HDD SMART information. The first two can be extract from calling the sys action on the sysinfo.cgi binary. The last one, however, requires calling a different one. On the NAS interface itself we can see the temperature for each drive, which means it must have an API to do that. And, in fact, it does:

That was easier than I thought.

We can then apply the same concept to different CGI binaries, such as the disk_smart one:

# QUERY_STRING="act=list" LD_PRELOAD=../ubuntu/hijack.so /volume0/usr/builtin/webman/portal/apis/storageManager/disk_smart.cgi | grep -v Content-type | jq
{
  "success": true,
  "disks": [
    {
      "did": 1,
      "dev": "sda",
      "hdd_status": "sync",
      "volume": "volume1",
      "model": "(...)",
      "serial": "(...)",
      "size": "976762584",
      "temp": 45,
      "rotated": "true",
      "nvme": "false",
      "smart_status": "Healthy",
      "state": "normal",
      "error": "none",
      "4k": false,
      "badblock": "no_error"
    },
    (...)
  ]
}

Based on this, we can easily rewrite our monitoring script to use only the original CGI binaries just like this:

#!/opt/bin/bash

MQTT_HOST=nice
MQTT_USER=try
MQTT_PASS=hehe
MQTT_PREFIX=vault

## ========================

function get_sysinfo {
  LD_PRELOAD=./hijack.so QUERY_STRING="act=sys" /volume0/usr/builtin/webman/portal/apis/information/sysinfo.cgi | grep -v "Content-type"
}

function get_disk_smart {
  LD_PRELOAD=./hijack.so QUERY_STRING="act=list" /volume0/usr/builtin/webman/portal/apis/storageManager/disk_smart.cgi | grep -v "Content-type"
}

function publish {
  echo "Publishing $MQTT_PREFIX/$1 -> $2"
  /usr/local/bin/docker run --rm eclipse-mosquitto \
    mosquitto_pub -u $MQTT_USER -P $MQTT_PASS -h $MQTT_HOST -t $MQTT_PREFIX/$1 -m $2
}

SYSINFO=$(get_sysinfo)
HDDSMART=$(get_disk_smart)

CPUTEMP=$(echo $SYSINFO | /opt/bin/jq -r ".cputemp")
publish cpu/temp $CPUTEMP

SYSTEMP=$(echo $SYSINFO | /opt/bin/jq -r ".systemp")
publish sys/temp $SYSTEMP

FAN0SPEED=$(echo $SYSINFO | /opt/bin/jq -r ".fan_speed[0]")
publish fan0/speed $FAN0SPEED

HDDTEMPS=$(echo $HDDSMART | jq -r '.disks[] | (.did|tostring) + " " + (.temp|tostring)')
while read did temp; do
  publish hdd$did/temp $temp
done < <(echo $HDDTEMPS | xargs -n2)

Third goal: rewriting our script!

There are two approaches here:

  1. Keep our script on the NAS itself and just install a MQTT client on it, adding it to our crontab.
  2. Move our script into a container, add the right permissions for it to run and just keep it there on a loop with a sleep.

Both approaches have their pros and cons, but I've decided to take a look at both. We want something simple, like a drop-in script that I can easily manage, add and remove features at any moment without too much trouble. The "drop-in" part makes me want to run this on a container, but figuring out which libraries are required to run sysinfo.cgi and disk_smart.cgi inside the container is just too much pain. I would need to either copy or mount them, so that the container could use them. This is just too much effort for something that does not require a container at all - afterall, the libraries used here sometimes do raw I/O as we have seen before.

So, turning our attention to running it locally on the NAS, we first need a MQTT client. A simple opkg install mosquitto-client adds the required software to our system straight from Entware's repositories. The only difference now is that this client supports multiple versions of the MQTT protocol, so we must specify the correct one with -V mqttv5. No harm done there. We also need to add the script to our crontab:

* * * * *  (cd /volume1/.@root/monitor; /opt/bin/bash monitor.sh)
Bash is also installed from Entware, hence the weird path.

So, the final script is this:

#!/opt/bin/bash

MQTT_HOST=never
MQTT_USER=gonna
MQTT_PASS=give
MQTT_PREFIX=youup

MOSQUITTO_PUB=/opt/bin/mosquitto_pub
ASUSTOR_APIS=/volume0/usr/builtin/webman/portal/apis

## ========================

function get_sysinfo {
  LD_PRELOAD=./hijack.so QUERY_STRING="act=sys" $ASUSTOR_APIS/information/sysinfo.cgi | grep -v "Content-type"
}

function get_disk_smart {
  LD_PRELOAD=./hijack.so QUERY_STRING="act=list" $ASUSTOR_APIS/storageManager/disk_smart.cgi | grep -v "Content-type"
}

function publish {
  echo "Publishing $MQTT_PREFIX/$1 -> $2"
  $MOSQUITTO_PUB -V mqttv5 -u $MQTT_USER -P $MQTT_PASS -h $MQTT_HOST -t $MQTT_PREFIX/$1 -m $2
}

SYSINFO=$(get_sysinfo)
HDDSMART=$(get_disk_smart)

CPUTEMP=$(echo $SYSINFO | /opt/bin/jq -r ".cputemp")
publish cpu/temp $CPUTEMP

SYSTEMP=$(echo $SYSINFO | /opt/bin/jq -r ".systemp")
publish sys/temp $SYSTEMP

FAN0SPEED=$(echo $SYSINFO | /opt/bin/jq -r ".fan_speed[0]")
publish fan0/speed $FAN0SPEED

HDDTEMPS=$(echo $HDDSMART | jq -r '.disks[] | (.did|tostring) + " " + (.temp|tostring)')
while read did temp; do
  publish hdd$did/temp $temp
done < <(echo $HDDTEMPS | xargs -n2)

This is it. Here's what we accomplished:

  1. Found a way of bypassing authentication when running CGI binaries locally.
  2. Modified the script to use those same binaries to provide the information we want (it's just easier).
  3. Removed the Docker dependency on the script, making it run way faster than before and not halting the system shutdown process.

The data is processed on my Home Assistant setup, which will handle any kind of alerts I might need (I don't have any yet oops!). It works as a centralized monitor for anything I'm hosting at home, so running MQTT was just the easiest thing I could do to feed data into it. If it works, it ain't dumb!

That's far enough for this Saturday! Now it's time to have some fun taking a deeper look at some other features on this NAS and its custom OS, for which I just downloaded the firmware update image... hehehe 😈

Mastodon