TrueNAS: Difference between revisions

Latest revision as of 12:32, 24 March 2025

Overview

Users

As of v24, TrueNAS sends alerts if you use root to log in to web UI. So I set up another user to login. I had to deal with these issues:

Even though everything was configured, I could not ssh until I added auxilary group "builtin_administrators" to the user.
Also, you should not disable password login as a password login user is required to log in to the web UI.
Home dirs are now properly located at

/mnt/safe/safe-ds/software/apps/hive-home

I could not get connected to samba shares from bandit after ugrading bandit to ubuntu 24.04; finally got it working by checking the samba box on the user! wtf

hive > Credentials Users > m Edit > check SMB User (and re-provide the same password)

Pools

TrueNAS provides storage via Pools. A pool is a bunch of raw drives gathered and managed as a set. My pools are one of these:

Pool type	Description
single drive	no TrueNAS advantage other than health checks
raid1 pair	mirrored drives give normal write speeds, fast reads, single-fail redundancy, costs half of storage potential
raid0 pair	striped drives gives fast writes, normal reads, no redundancy, no storage cost
raid of multiple drives	raidz: optimization of read/write speed, redundancy, storage potential

The three levels of raidz are:

raidz: one drive is consumed just for parity (no data storage, ie you only get (n-1) storage total), and one drive can be lost without losing any data; fastest; very dangerous to recover from lost drive ("resilver" process is brutal on remaining drives - don't wait)
raidz2: two drives for parity, two can be lost
raidz3: three drives for parity, three can be lost; slowest

Datasets

Every pool should have one child dataset. This is where we set the permissions, important for SAMBA access. We could have more than one child dataset, but I haven't had the need.

Adding

hive > Storage > Pools > mine (or any newly created pool) > Add Dataset

Dataset settings:

name #pool#-ds
share type SMB

Save, then continue...

hive > Storage > Pools > mine (or any newly created pool) > mine-ds > Edit ACL
user m
group m
ACL
 who everyone@
 type Allow
 Perm type Basic   (NOTE: "Perm type Basic" is important!)
 Perm Full control (NOTE: this is not the default, you will need to change it)
 Flags type Basic
 Flags Inherit     (NOTE: this is not the default, you will need to change it)
(REMOVE on all other blocks)
SAVE

Windows SMB Shares

Share each dataset as a Samba share under:

Sharing > Windows Shares (SMB)

Use the pool name for the share name.
Use the same ACL as for the dataset.
Purpose: No presets

WARNING I had to set these Auxiliary parameters in the SMB config so that symlinks would be followed.

Services > SMB > Actions > configuration > Auxiliary Parameters:

unix extensions = no
follow symlinks = yes
wide links = yes

Stop and restart SMB service

Maintenance

Fix any pool errors ASAP

Once you get a pool error, you should immediately assess it so it doesn't corrupt data and make things worse.

It will be an error with the drive, the connectors to the drive, or a sun flare (seriously tho! like a power spike or VM termination that caused bad data).

First thing, check the drive. Use smartmon tools, and if you see any issue there, better pull and replace it, and do a full badblocks test on the drive using the external SATA to USB mounter. Any problems, chuck it and replace it.

If you have a bad drive, you will need to replace it.

Whether you replace the drive, or ensure it is good, you now have to clean the pool of the bad data (currently the bad data exists, and will persist!).

From CLI:

sudo zpool status -v mine
# rm all the files that are reported with errors - they are BAD
sudo zpool clear mine
# IMMEDIATELY after clearing errors, go into the UI and start a SCRUB!
# Repeat as needed

With any luck this will clear up pool errors.

Regularly do SMART and scrub, and manual trim

YOU MUST DO THIS REGULARLY!

Schedule weekly scrubs
Schedule weekly SMART tests
Occasionally manually do a trim on any suss drive, eg:

sudo zpool mine trim

Fix any errors as quickly as possible, using zpool clear, scrub, drive replace, resilver

From here:

A drive, vdev or pool is declared degraded if ZFS detects problems with the data. If you reboot the error count is reset. A resilver will heal the data errors if there is sufficient redundancy. ZFS will only spot the data issues on read, that’s why we have scrubs, a forced read of all the data to try and determine if there are any errors. So schedule regular scrubs are important. This will not tell you why the data is corrupted, for this you have S.M.A.R.T tests, you need to schedule those as well, both long and short.

Burn in a new drive

ALWAYS do this even tho it's a PITA. Less pain than not doing it.

I didn't do it for my 7-8TB-drive zraid. Murphy said FUCK YOU and one of the eight went bad. So... do the test, dumbass.

But of course I found a way to stay lazy... TrueNAS has the ability to run SMART tests directly on a drive, so do it there. Or just wait for SMART failures to show up. God damn, laziness rules. Maybe. Fool.

Pool speed check

CAST to SAFE: ~114MB/s write (compressed) on 60MB/s network

Do this to test raw write speed from anywhere on the LAN to the [safe] pool:

dd if=/dev/zero of=/mnt/safe/safe-dd/speedtest.data bs=4M count=10000
# on hive: 4GB transferred in ~15sec at ~2.9GB/sec, WOW
# on cast: 42GB copied in 371sec at 114MB/s - that seems in line with my network speed (see below)

To test the network bandwidth limit:

# on hive
iperf -s -w 2m # to run in server mode, looking for 2MB transfers
# on another LAN machine
iperf -c hive -w 2m -t 30s -i 1s
# on cast: 1.51 GB at 477Mbits/sec aka 60MB/sec
# I have a 1Gb switch, i guess that's all we get out of it?

Replace a bad disk in a pool

Watch TrueNAS for CRITICAL alerts that indicate a drive is failing its self-tests. You can check SMART results yourself to make sure it is actually bad.
Make note of its serial number.
Find the drive in the pool, make note of its drive id (not needed but no harm).
Change the pool drive status from FAULTED to OFFLINE

Storage > Pools > badpool > triple-dot Status > baddrive > triple-dot-status > FAULTED to OFFLINE

Power down the whole fucking PROXMOX machine
Pull it, and swap out bad drive for good
Replace it; this will automatically start a resilver (buckle up for that, it takes FOREVER and will POUND your disks)

Storage > Pools > badpool > baddrive > triple-dot-status > REPLACE

The resilver SHOULD be okay if you have vetted your drives well. If the resilver seems to go bad, or stop, or pause, or ANYTHING... you can try restarting TrueNAS. This actually clears a lot of error marking and may actually be a required step. Crazy TrueNAS.
Once resilver is done, you will probably need to clear pool errors.
Run a full badblocks on the faulted drive and ensure it was the drive at fault (not connectors or something else)
Return the drive if under warranty; otherwise, RECYCLE IT

NOTE on raid level

My 7-drive raidz arrays can only lose ONE drive before they go boom, so you MUST replace bad disks immediately. raidz2 uses 2 drives, raidz3 uses three, but SSD raidz you-can-lose-one-drive is, to me, a sweet spot.

Remove a bad pool

Make note of which drives use the pool; likely some are bad and some are good and perhaps worth reusing elsewhere.
Disconnect SMB connections to the pool
- Update valid shares in mh-setup-samba-shares
- Rerun mh-setup-samba-shares everywhere (eventually anyway)
- One possible easier way to get SMB disconnected from the pool is to stop SMB service in TrueNAS
- Sadly, to get through this for my splat pool, I had to remove pool, fail, restart hive, remove pool.
Pool > (gear) > Export/disconnect
- [x] Delete configuration of shares that use this pool (to remove the associated SMB share)
- [x] Destroy data on this pool (you MUST select this or the silly thing will attempt to export the data)

Update TrueNAS

Updating is baked into the UI, nice! And I have auto-updates enabled. So nice.

These guys work hard on this, to make sure releases are well tested. Watch for alerts about newly available updates. Do not update past the current release!

System > Update > [Train] (ensure you have a good one selected; on occasion, you'll want to CHANGE it to select a newer stable release!)
Give the system a minute to load available updates...
Press Download available updates > The modal will ask if you want to apply and restart > Say yes

That's about it!

Upgrade from CORE to SCALE

This is huge, changes from a freebsd system to a linux system. SCALE has been around long enough that it's time to switch!

Change the train to SCALE ElectricEel 24.04 [release]
Let it upgrade - HOLY HELL what a nice ride!
- It set up a new grub menu, preserving the old CORE install, wow.
- It started up flawlessy!
I got advice to disallow root login
- I went on a bender, adding a new user, moving home dir, and spent the time to get it configured as admin. See #Users notes for details.
Re-set hostname

Network > Global Configuration > Settings > change hostname > Save

Configuration

Set up user

I set up m user (1000) and m group (1000)

Set up alert emails

Go to one of your google accounts to get an App password. It has to be an account that has 2fa turned on, bleh, so don't use [email protected]. I went with [email protected].

Accounts > Users > root > edit password > [email protected]
System > Email > from email > [email protected], smtp.gmail.com 465 Implicit SSL, SMTP auth: (email/API password)

Then you can test it here:

System > Email > (at bottom, next to Save...) Send Test Email

Set up user ssh

This was not fun.

Set up user
You have to set password ON and make sure to check [x] Allow sudo
Make sure to allow Samba Authentication for m user that is used for samba
Add public key to user
Create a valid folder on the /mnt NAS shares for the user's home; you can mkdir using samba; I created:

/mnt/safe/safe-ds/software/apps/hive-home

set the user's home to that ^; turn off password auth
Turn on SSH service
System > SSH Keypairs > Add SSH keypair for main user m
System > SSH Connections > Add, use localhost, keypair from prev step

It should work but it does not!

Open a TrueNas prompt via proxmox console
Go to the home dir, there should be an .ssh there now
Reduce permissions on both HOME DIR (700) and .ssh/KEY (400)
Get a shell and run `sudo visudo` and add this line:

m ALL=(ALL) NOPASSWD: ALL

Finally! It works!

Troubleshooting

SOME of my shares were throwing Permission Denied errors on mv. Solutions:

I applied permissions again, recursively, then restarted the SMB service on hive and the problem went away.
You can also always go to the melange hive console, request a shell, and things always seem to work from there (but you're in FreeBSD world and don't have any beauty scripts like mh-move-torrent!)