Melange history

ANOTHER FUCKING SAMSUNG 8TB 870 QVO BITES THE DUST

- 2024/10/15 REPLACE THE NEXT BAD MINE DRIVE

 which one is it...
 i THINK it was da14.... but now hive doesn't say
 it's this drive:
   /dev/gptid/6660b7e0-a05b-11ee-898d-ddd4a1cedd1f

 here it is in a notification:
   CRITICAL
   Pool mine state is DEGRADED: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state.
   The following devices are not healthy:
   Disk ATA Samsung SSD 870 S5VUNJ0W900939R is UNAVAIL

 okay let's offline it and swap it out ffs... with: S5VUNJ0W706311K

- HELL it is MELTING... the sixth disk da13 has now FAULTED... DID THE FAT LADY JUST START SINGING?? The resilver is stuck at 01.52%... eta keeps going up... at 6 DAYS NOW LOL... and we are supposed to watch Pachinko S2E8 dammmmmit! lolsszz

I hear i can possibly run `zpool clear` on the drive to pretend it is okay again. Might work for an SSD? No clue yet... I hate waiting... do i just reboot this fucking VM... I did both. FUcking hell... so utterly stressful...

I WAITED overnight... and it was... okay-ish the next day?? All disks good!! WTF!

But then it flipped to ONLINE (Unhealthy) based on error count i think? Doing a scrub... FINGERS FIRMLY CROSSED...

- NOW GO GET ANOTHER REPLACEMENT!

GRIM DEATH

2024/03/23 We are harvesting the hive grim drives for use as melange drives for VM storate.

I changed the location of the hive System Dataset Pool from grim to safe.

I copied all grim data to safe (oops, i forgot SharedDownloads... it's empty now...)

I removed the 'grim' pool from FreeNAS.

Now I need to move the drives! I want to keep the PCI card as pass-thru, but the two grim drives are on it.

make note of all hive drive assignments
open melange
- remove both grim drives from the PCI passthru
- move the one mine drive that is on SATA from SATA to one of the PCI passthroughs
- move one safe drive from SATA to the other of the PCI passthroughs
- add both grim drives to SATA
close and restart melange and see if you can reconnect everything
- the grim drives should now show up on melange, not passed through
- the safe and mine drives should show up passed through, but perhaps hive cannot associate them; if not, try to fix
- if not, RESET/KEEP GOING brother!

Let's go...

First, capture everything...

SATA DRIVES:

 🌐 m@melange  [~] sudo lsblk |awk 'NR==1{print $0" DEVICE-ID(S)"}NR>1{dev=$1;printf $0" ";system("find /dev/disk/by-id -lname \"*"dev"\" -printf \" %p\"");print "";}'|grep -v -E 'part|lvm'
NAME                         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT DEVICE-ID(S)
sda                            8:0    0 931.5G  0 disk   /dev/disk/by-id/ata-CT1000MX500SSD1_2122E5A2FB01 /dev/disk/by-id/wwn-0x500a0751e5a2fb01
sdb                            8:16   0 931.5G  0 disk   /dev/disk/by-id/ata-CT1000MX500SSD1_2122E5A2FD56 /dev/disk/by-id/wwn-0x500a0751e5a2fd56
sdc                            8:32   1 931.5G  0 disk   /dev/disk/by-id/ata-CT1000MX500SSD1_2122E5A313D2 /dev/disk/by-id/wwn-0x500a0751e5a313d2
sdd                            8:48   1 931.5G  0 disk   /dev/disk/by-id/ata-CT1000MX500SSD1_2122E5A313D6 /dev/disk/by-id/wwn-0x500a0751e5a313d6
sde                            8:64   1 931.5G  0 disk   /dev/disk/by-id/ata-CT1000MX500SSD1_2117E59AAE1B /dev/disk/by-id/wwn-0x500a0751e59aae1b
sdf                            8:80   1 931.5G  0 disk   /dev/disk/by-id/ata-CT1000MX500SSD1_2121E5A2E131 /dev/disk/by-id/wwn-0x500a0751e5a2e131
sdg                            8:96   1 931.5G  0 disk   /dev/disk/by-id/ata-CT1000MX500SSD1_2122E5A3009A /dev/disk/by-id/wwn-0x500a0751e5a3009a
sdh                            8:112  1   7.3T  0 disk   /dev/disk/by-id/ata-Samsung_SSD_870_QVO_8TB_S5VUNJ0W706320H /dev/disk/by-id/wwn-0x5002538f33710e1d
nvme0n1                      259:0    0 931.5G  0 disk   /dev/disk/by-id/nvme-Samsung_SSD_970_EVO_Plus_1TB_S4EWNJ0N107994E /dev/disk/by-id/nvme-eui.002538510141169d

104 PASSTHRUS:

 🌐 m@melange  [~] sudo cat /etc/pve/qemu-server/104.conf
boot: order=scsi0;net0
cores: 4
hostpci0: 0a:00.0,rombar=0
memory: 24576
name: hive
net0: virtio=DA:E8:DA:81:EC:64,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: local-lvm:vm-104-disk-0,size=25G
scsi11: /dev/disk/by-id/ata-CT1000MX500SSD1_2122E5A2FD56,size=976762584K,backup=no,serial=2122E5A2FD56
scsi12: /dev/disk/by-id/ata-CT1000MX500SSD1_2122E5A2FB01,size=976762584K,backup=no,serial=2122E5A2FB01
scsi13: /dev/disk/by-id/ata-CT1000MX500SSD1_2122E5A313D2,size=976762584K,backup=no,serial=2122E5A313D2
scsi14: /dev/disk/by-id/ata-CT1000MX500SSD1_2122E5A313D6,size=976762584K,backup=no,serial=2122E5A313D6
scsi15: /dev/disk/by-id/ata-CT1000MX500SSD1_2117E59AAE1B,size=976762584K,backup=no,serial=2117E59AAE1B
scsi16: /dev/disk/by-id/ata-CT1000MX500SSD1_2121E5A2E131,size=976762584K,backup=no,serial=2121E5A2E131
scsi17: /dev/disk/by-id/ata-CT1000MX500SSD1_2122E5A3009A,size=976762584K,backup=no,serial=2122E5A3009A
scsi18: /dev/disk/by-id/ata-Samsung_SSD_870_QVO_8TB_S5VUNJ0W706320H,backup=0,serial=S5VUNJ0W706320H,size=7814026584K
scsihw: virtio-scsi-pci
smbios1: uuid=dc3d077c-0063-41fe-abde-a97674d14dc8
sockets: 1
startup: order=1,up=90
vmgenid: 1dcb048d-122f-4343-ad87-1a01ee8284a6

PRE-MOVE DRIVES:

7 1tb safe drives on SATA
1 8tb mine drive on SATA
2 4tb grim drives on PCI
6 8tb mine drives on PCI

PRE-MOVE hive:

da7  2122E5A3009A    931.51 GiB  safe
da8  S5VUNJ0W706320H   7.28 TiB  mine
da11 S5B0NW0NB01796J   3.64 TiB  N/A
da12 S4CXNF0M307721X   3.64 TiB  N/A

POST-MOVE DRIVES:

6 1tb safe drives on SATA
2 8tb mine drives on SATA
1 1tb safe drive on SATA
7 8tb mine drives on PCI

STEPS THAT MAY NEED REVERSAL

Do i need to adjust hive or melange before opening case? I guess i could remove the grim sata passthru... AND the SAFE i'm going to move, too, it will no longer pass through (we will be using the PCI card passthru for it).

shut down VMS, then hive (but not melange)
remove two drives from SATA passthru, first is 1 from SAFE (moving to PCI card) and second is 1 from MINE

scsi16: /dev/disk/by-id/ata-CT1000MX500SSD1_2121E5A2E131,size=976762584K,backup=no,serial=2121E5A2E131

^^^ SWITCHING TO SWAPPING THIS ONE, it is easier to access for swapping

scsi17: /dev/disk/by-id/ata-CT1000MX500SSD1_2122E5A3009A,size=976762584K,backup=no,serial=2122E5A3009A

^^ ACTUALLY, NOT THIS ONE, leave it

scsi18: /dev/disk/by-id/ata-Samsung_SSD_870_QVO_8TB_S5VUNJ0W706320H,backup=0,serial=S5VUNJ0W706320H,size=7814026584K

shut down melange
remove two 4GB drives from PCI: S5B0NW0NB01796J, S4CXNF0M307721X
move the only 8TB from SATA to PCI: S5VUNJ0W706320H
move 1 1TB from SATA to PCI: 2121E5A2E131 (not 2122E5A3009A)
connect two 4GB drives to SATA: S5B0NW0NB01796J, S4CXNF0M307721X

WENT GREAT, hive recognized the move without me doing ANYTHING!!

Now we can set up the ex-grim 4TBs for VM usage, yay.

2023 July full upgrade

I need to install a Windows 11 VM. Also have Ubuntu 20.04 machines that should be moved to 22.04. Figured as good a reason as any for a full upgrade of everything.

Upgrade bitpost first. Upon reboot, my $*(@ IP changed again. Fuck you google. Spent a while resetting that. Here are the notes (also in my Red Dead RP journal, keepin it real (real accessible when everything's down), lol!):

cast$ ssh bitpost # not bp or bitpost.com, so we get to the LAN resource
sudo su -
stronger_firewall_and_save # internet should now work
# get new IP from whatsmyip
# fix bitpost.com DNS at domains.google.com
# WAIT for propagation.... might as well fix the other DNS records...
sudo service dnsmasq restart
ping bitpost.com # EVENTUALLY this will work! May need to repeat this AND previous step.

Ask Tom to update E-S DNS to use new IP
Upgrade abtdev1, then all Ubuntu boxes (glam is toughest), then positronic last, with this pattern:

mh-update-ubuntu # and reboot
sudo do-release-upgrade # best to connect directly, but ssh worked fine too
sudo shutdown -h now # to prep for melange reboot

Upgrade hive's TrueNAS install, via https://hive CHECK FOR UPDATES, then shut it down
Update and reboot melange PROXMOX install, via https://melange:8006 Datacenter > melange > Updates
CHECK EVERYTHING
- proxmox samba share for backups
- samba shares
- at ptl to ensure it can get to positronic
- shitcutter and blogs and wiki and...
- I had a terrible time getting GLAM apache + PHP working again now that Ubuntu uses PHP 8.1; just needed to ENABLE THE MODULE, ffs:

a2enmod php8.1

6.3 > 7.0

Proxmox uses apt for upgrades. I followed this, for the most part.

Update all VMS
Shut down all VMS
Fully update current version's apt packages - this took me from 6.3 to 6.4, a necessary first step.

sudo apt update
sudo apt dist-upgrade

Upgrade basic apt sources list from buster to bullseye

sudo sed -i 's/buster\/updates/bullseye-security/g;s/buster/bullseye/g' /etc/apt/sources.list
# instructions discuss pve-enterprise but i needed to change pve-no-subscription instead - but same exact steps, otherwise
# ie, leave this commented out, but might as well set to bullseye
# /etc/apt/sources.list.d/pve-enterprise.list
# and update this to bullseye
# /etc/apt/sources.list.d/pve-no-subscription.list

Perform the full upgrade to bullseye / pm 7

sudo apt update
sudo apt dist-upgrade

Reboot

Manual restart notes

NOTE: This shouldn't be a problem any more with newer staged order restart.

One time bandit samba shares don't mount (it comes up too fast perhaps?). So restart them then restart qbt nox:

mh-setup-samba-shares
sudo service qbittorrent-nox restart

I did another round of `apt update && apt dist-upgrade` without stopping containers and it went fine (with bandit fixup still needed after reboot, tho).

sudo apt update
sudo apt dist-upgrade
ssh bandit
mh-setup-samba-shares
sudo service qbittorrent-nox restart

Add 7 1TB zraid

After adding 7 new 1 TB ssds:

 🌐 m@melange  [~] sudo lsblk |awk 'NR==1{print $0" DEVICE-ID(S)"}NR>1{dev=$1;printf $0" ";system("find /dev/disk/by-id -lname \"*"dev"\" -printf \" %p\"");print "";}'|grep -v -E 'part|lvm'
NAME                         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT DEVICE-ID(S)
sdh                            8:112  0 931.5G  0 disk   /dev/disk/by-id/wwn-0x500a0751e5a2fb01 /dev/disk/by-id/ata-CT1000MX500SSD1_2122E5A2FB01
sdi                            8:128  0 931.5G  0 disk   /dev/disk/by-id/ata-CT1000MX500SSD1_2122E5A2FD56 /dev/disk/by-id/wwn-0x500a0751e5a2fd56
sdj                            8:144  1 931.5G  0 disk   /dev/disk/by-id/ata-CT1000MX500SSD1_2122E5A313D2 /dev/disk/by-id/wwn-0x500a0751e5a313d2
sdk                            8:160  1 931.5G  0 disk   /dev/disk/by-id/ata-CT1000MX500SSD1_2122E5A313D6 /dev/disk/by-id/wwn-0x500a0751e5a313d6
sdl                            8:176  1 931.5G  0 disk   /dev/disk/by-id/ata-CT1000MX500SSD1_2117E59AAE1B /dev/disk/by-id/wwn-0x500a0751e59aae1b
sdm                            8:192  1 931.5G  0 disk   /dev/disk/by-id/wwn-0x500a0751e5a2e131 /dev/disk/by-id/ata-CT1000MX500SSD1_2121E5A2E131
sdn                            8:208  1 931.5G  0 disk   /dev/disk/by-id/wwn-0x500a0751e5a3009a /dev/disk/by-id/ata-CT1000MX500SSD1_2122E5A3009A
nvme0n1                      259:0    0 931.5G  0 disk   /dev/disk/by-id/nvme-eui.002538510141169d /dev/disk/by-id/nvme-Samsung_SSD_970_EVO_Plus_1TB_S4EWNJ0N107994E

Before adding 7 new 1 TB ssds:

 🌐 m@melange  [~] ls /dev/
autofs           dm-8       i2c-7         net        stdin   tty28  tty5       ttyS12  ttyS6    vcsu1
block            dm-9       i2c-8         null       stdout  tty29  tty50      ttyS13  ttyS7    vcsu2
btrfs-control    dri        i2c-9         nvme0      tty     tty3   tty51      ttyS14  ttyS8    vcsu3
bus              ecryptfs   initctl       nvme0n1    tty0    tty30  tty52      ttyS15  ttyS9    vcsu4
char             fb0        input         nvme0n1p1  tty1    tty31  tty53      ttyS16  udmabuf  vcsu5
console          fd         kmsg          nvme0n1p2  tty10   tty32  tty54      ttyS17  uhid     vcsu6
core             full       kvm           nvme0n1p3  tty11   tty33  tty55      ttyS18  uinput   vfio
cpu              fuse       lightnvm      nvram      tty12   tty34  tty56      ttyS19  urandom  vga_arbiter
cpu_dma_latency  gpiochip0  log           port       tty13   tty35  tty57      ttyS2   userio   vhci
cuse             hpet       loop0         ppp        tty14   tty36  tty58      ttyS20  vcs      vhost-net
disk             hugepages  loop1         pps0       tty15   tty37  tty59      ttyS21  vcs1     vhost-vsock
dm-0             hwrng      loop2         psaux      tty16   tty38  tty6       ttyS22  vcs2     watchdog
dm-1             i2c-0      loop3         ptmx       tty17   tty39  tty60      ttyS23  vcs3     watchdog0
dm-10            i2c-1      loop4         ptp0       tty18   tty4   tty61      ttyS24  vcs4     zero
dm-11            i2c-10     loop5         pts        tty19   tty40  tty62      ttyS25  vcs5     zfs
dm-12            i2c-11     loop6         pve        tty2    tty41  tty63      ttyS26  vcs6
dm-13            i2c-12     loop7         random     tty20   tty42  tty7       ttyS27  vcsa
dm-14            i2c-13     loop-control  rfkill     tty21   tty43  tty8       ttyS28  vcsa1
dm-2             i2c-14     mapper        rtc        tty22   tty44  tty9       ttyS29  vcsa2
dm-3             i2c-2      mcelog        rtc0       tty23   tty45  ttyprintk  ttyS3   vcsa3
dm-4             i2c-3      mem           shm        tty24   tty46  ttyS0      ttyS30  vcsa4
dm-5             i2c-4      mpt2ctl       snapshot   tty25   tty47  ttyS1      ttyS31  vcsa5
dm-6             i2c-5      mpt3ctl       snd        tty26   tty48  ttyS10     ttyS4   vcsa6
dm-7             i2c-6      mqueue        stderr     tty27   tty49  ttyS11     ttyS5   vcsu

macOS USB passthru failed attempt

That doesn't work on macOS. Tried setting usb mapping via console, following this:

sudo qm monitor 111
qm> info usbhost
qm> quit
sudo qm set 111 -usb1 host=05ac:12a8

No luck, same result. Reading his remarks on USB forwarding, try resetting machine type:

machine: pc-q35-6.0 (instead of latest, which was 6.2 at time of writing)
remove this from /etc/pve/qemu-server/111.conf: -global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off

Hmm.. perhaps it is a conflict between Nick's usb keyboard config and my usb port selection... try plugging usb into another port and remapping...

No luck. FFS. Reset to 6.2 and see if we have any luck with hotplug line removed from config... Nope.

Keep trying permutations... nothing from googling indicates taht this shouldn't just FUCKING WORK...

Remove this and re-add the hotplug line, on the off chance it shouldn't be used with q35 v6.2:

-global nec-usb-xhci.msi=off

Nope, that jsut caused a problem with "Springboard", not working on this Mac, or some shit. Re-adding the line...

Well what now? Google more?

Update and reboot proxmox and retry... no luck.

Try changing from blue to light-blue port... the device is mapped so it should be passed through... nope.

Try this guy's approach to mount an EFI Disk

lsusb
  Bus 004 Device 009: ID 05ac:12a8 Apple, Inc. iPhone 5/5C/5S/6/SE
ls -al /dev/bus/usb/004/009
  crw-rw-r-- 1 root root 189, 392 Jul 22 16:10 /dev/bus/usb/004/009
sudo emacs /etc/pve/qemu-server/111.conf
  lxc.cgroup.devices.allow: c 189:* rwm
  lxc.mount.entry: /dev/bus/usb/004 dev/bus/usb/004 none bind,optional,create=dir

Nope.

Try mapping the port instead of device ID, from the Proxmox UI... Nope.

How can i check the apple side for any issues? straight up google for that, macOS not seeing a USB device.

System Information > USB > nada

hrmphhhh. Never got it working. RE-google next month maybe...

Add samba shares manually

During original configuration, I added samba shares manually.

sudo emacs /etc/fstab # and paste samba stanza from another machine
sudo emacs /root/samba_credentials
sudo mkdir /spiceflow && sudo chmod 777 /spiceflow
🌐 m@melange  [~] mkdir /spiceflow/bitpost
🌐 m@melange  [~] mkdir /spiceflow/grim
🌐 m@melange  [~] mkdir /spiceflow/mack
🌐 m@melange  [~] mkdir /spiceflow/reservoir
🌐 m@melange  [~] mkdir /spiceflow/sassy
🌐 m@melange  [~] mkdir /spiceflow/safe

Now you can mount em up and hang em high!

Old VMs

Mongolden