TrueNAS

From Bitpost wiki

Overview

Pools

TrueNAS provides storage via Pools. A pool is a bunch of raw drives gathered and managed as a set. My pools are one of these:

Pool type Description
single drive no TrueNAS advantage other than health checks
raid1 pair mirrored drives give normal write speeds, fast reads, single-fail redundancy, costs half of storage potential
raid0 pair striped drives gives fast writes, normal reads, no redundancy, no storage cost
raid of multiple drives raidz: optimization of read/write speed, redundancy, storage potential

The three levels of raidz are:

  • raidz: one drive is consumed just for parity (no data storage, ie you only get (n-1) storage total), and one drive can be lost without losing any data; fastest; very dangerous to recover from lost drive ("resilver" process is brutal on remaining drives - don't wait)
  • raidz2: two drives for parity, two can be lost
  • raidz3: three drives for parity, three can be lost; slowest

Datasets

Every pool should have one child dataset. This is where we set the permissions, important for SAMBA access. We could have more than one child dataset, but I haven't had the need.

Adding

hive > Storage > Pools > mine (or any newly created pool) > Add Dataset

Dataset settings:

name #pool#-ds
share type SMB

Save, then continue...

hive > Storage > Pools > mine (or any newly created pool) > mine-ds > Edit ACL
user m
group m
ACL
 who everyone@
 type Allow
 Perm type Basic   (NOTE: "Perm type Basic" is important!)
 Perm Full control (NOTE: this is not the default, you will need to change it)
 Flags type Basic
 Flags Inherit     (NOTE: this is not the default, you will need to change it)
(REMOVE on all other blocks)
SAVE

Windows SMB Shares

Share each dataset as a Samba share under:

Sharing > Windows Shares (SMB)
  • Use the pool name for the share name.
  • Use the same ACL as for the dataset.
  • Purpose: No presets

WARNING I had to set these Auxiliary parameters in the SMB config so that symlinks would be followed.

  • Services > SMB > Actions > configuration > Auxiliary Parameters:
unix extensions = no
follow symlinks = yes
wide links = yes
  • Stop and restart SMB service

Maintenance

Burn in a new drive

ALWAYS do this even tho it's a PITA. Less pain than not doing it.

I didn't do it for my 7-8TB-drive zraid. Murphy said FUCK YOU and one of the eight went bad. So... do the test, dumbass.

But of course I found a way to stay lazy... TrueNAS has the ability to run SMART tests directly on a drive, so do it there. Or just wait for SMART failures to show up. God damn, laziness rules. Maybe. Fool.

Regularly do SMART, scrub, resilver

YOU MUST DO THIS REGULARLY!

From here:

A drive, vdev or pool is declared degraded if ZFS detects problems with the data. If you reboot the error count is reset. A resilver will heal the data errors if there is sufficient redundancy. ZFS will only spot the data issues on read, that’s why we have scrubs, a forced read of all the data to try and determine if there are any errors. So schedule regular scrubs are important. This will not tell you why the data is corrupted, for this you have S.M.A.R.T tests, you need to schedule those as well, both long and short.

to get a handle on the situation as is, you need to trigger a scrub and long smart tests.

Never do more than one of these at a time, and never do any of them during heavy disk usage (backups, eg).

SMART can be done weekly (not too often or it will contribute to early wear-out of SSDs).

Same for scrub.

Resilver happens when a drive issue requires the data to be rebalanced or redistributed. Buckle up for this one!

Pool speed check

CAST to SAFE: ~114MB/s write (compressed) on 60MB/s network

Do this to test raw write speed from anywhere on the LAN to the [safe] pool:

dd if=/dev/zero of=/mnt/safe/safe-dd/speedtest.data bs=4M count=10000
# on hive: 4GB transferred in ~15sec at ~2.9GB/sec, WOW
# on cast: 42GB copied in 371sec at 114MB/s - that seems in line with my network speed (see below)

To test the network bandwidth limit:

# on hive
iperf -s -w 2m # to run in server mode, looking for 2MB transfers
# on another LAN machine
iperf -c hive -w 2m -t 30s -i 1s
# on cast: 1.51 GB at 477Mbits/sec aka 60MB/sec
# I have a 1Gb switch, i guess that's all we get out of it?

Replace a bad disk in a raidz pool

My 7-drive raidz arrays can only lose ONE drive before they go boom, so you MUST replace bad disks immediately. raidz2 uses 2 drives, raidz3 uses three, but SSD raidz you-can-lose-one-drive is, to me, a sweet spot.

  • Watch TrueNAS for CRITICAL alerts that indicate a drive is failing its self-tests.
  • Make note of its serial number.
  • Find the drive in the pool, make note of its drive id (not needed but no harm).
  • Change the pool drive status from FAULTED to OFFLINE

Storage > Pools > badpool > triple-dot Status > baddrive > triple-dot-status > FAULTED to OFFLINE

  • Power down the whole fucking PROXMOX machine
  • Pull it, and swap out bad drive for good
  • Replace it

Storage > Pools > badpool > baddrive > triple-dot-status > REPLACE

Remove a bad pool

  • Make note of which drives use the pool; likely some are bad and some are good and perhaps worth reusing elsewhere.
  • Disconnect SMB connections to the pool
    • Update valid shares in mh-setup-samba-shares
    • Rerun mh-setup-samba-shares everywhere (eventually anyway)
    • One possible easier way to get SMB disconnected from the pool is to stop SMB service in TrueNAS
    • Sadly, to get through this for my splat pool, I had to remove pool, fail, restart hive, remove pool.
  • Pool > (gear) > Export/disconnect
    • [x] Delete configuration of shares that use this pool (to remove the associated SMB share)
    • [x] Destroy data on this pool (you MUST select this or the silly thing will attempt to export the data)

Update TrueNAS

Updating is baked into the UI, nice! And I have auto-updates enabled. So nice.

These guys work hard on this, to make sure releases are well tested. Watch for alerts about newly available updates. Do not update past the current release!

System > Update > [Train] (ensure you have a good one selected; on occasion, you'll want to CHANGE it to select a newer stable release!)
Give the system a minute to load available updates...
Press Download available updates > The modal will ask if you want to apply and restart > Say yes

That's about it!

Configuration

Set up user

I set up m user (1000) and m group (1000)

Set up alert emails

Go to one of your google accounts to get an App password. It has to be an account that has 2fa turned on, bleh, so don't use moodboom@gmail.com. I went with abettersoftwaretrader@gmail.com.

Accounts > Users > root > edit password > abettersoftwaretrader@gmail.com
System > Email > from email > abettersoftwaretrader@gmail.com, smtp.gmail.com 465 Implicit SSL, SMTP auth: (email/API password)

Then you can test it here:

System > Email > (at bottom, next to Save...) Send Test Email

Set up user ssh

This was not fun.

  • Set up user
  • You have to set password ON and make sure to check [x] Allow sudo
  • Make sure to allow Samba Authentication for m user that is used for samba
  • Add public key to user
  • Create a valid folder on the /mnt NAS shares for the user's home; you can mkdir using samba; I created:
/mnt/safe/safe-ds/software/apps/hive-home
  • set the user's home to that ^; turn off password auth
  • Turn on SSH service
  • System > SSH Keypairs > Add SSH keypair for main user m
  • System > SSH Connections > Add, use localhost, keypair from prev step

It should work but it does not!

  • Open a TrueNas prompt via proxmox console
  • Go to the home dir, there should be an .ssh there now
  • Reduce permissions on both HOME DIR (700) and .ssh/KEY (400)
  • Get a shell and run `sudo visudo` and add this line:
m ALL=(ALL) NOPASSWD: ALL

Finally! It works!

Troubleshooting

SOME of my shares were throwing Permission Denied errors on mv. Solutions:

  • I applied permissions again, recursively, then restarted the SMB service on hive and the problem went away.
  • You can also always go to the melange hive console, request a shell, and things always seem to work from there (but you're in FreeBSD world and don't have any beauty scripts like mh-move-torrent!)