Postgres: Difference between revisions

From Bitpost wiki
 
(58 intermediate revisions by the same user not shown)
Line 1: Line 1:
=== DEV INSTALL ===
=== Usage ===
* clone and install [https://github.com/postgres/postgres postgres]
cd ~/development/Libraries/c
git clone git@github.com:postgres/postgres.git
cd postgres
mkdir default_configure && cd default_configure
../configure --prefix=/usr/local && make && sudo make install
# may need: sudo apt install libreadline-dev bison flex
# use apt search libreadline (etc)
* clone and install [https://github.com/jtv/libpqxx libpqxx]
cd ~/development/Libraries/c++
git clone git@github.com:jtv/libpqxx.git
cd libpqxx
# add pg bin folder to path
export PATH=$PATH:/usr/local/bin
mkdir default_configure && cd default_configure
../configure --disable-documentation && make && sudo make install
* set up a data folder where postgres will put all its data:
# Set up a symlink to a larger RAID drive
mkdir /spiceflow/softraid/development/postgres-data
sudo mkdir /usr/local/pgsql
cd /usr/local/pgsql
ln -s /spiceflow/softraid/development/postgres-data data


=== START SERVER ===
==== PGAdmin ====
You must have a postgres user:
 
  # on gentoo
PGAdmin is running on bitpost and can connect to any db on the LAN.
  sudo useradd postgres
 
==== Make a copy of a table ====
 
CREATE TABLE new_table AS SELECT * FROM existing_table (WHERE ...);
 
==== Readable UNIX epoch ====
 
select to_timestamp((json->>'timestamp')::int) at time zone 'utc', * from stockquotes where (json->>'quote')::float < 0.0;
 
Yes it's that ridiculous.  Yes it's [https://dba.stackexchange.com/a/80114 really even worse than that].
 
==== Table sizes ====
More ridiculous postgres overengineered obfuscated NONSENSE:
  SELECT nspname || '.' || relname AS "relation",
    pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
  FROM pg_class C
  LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
  WHERE nspname NOT IN ('pg_catalog', 'information_schema')
    AND C.relkind <> 'i'
    AND nspname !~ '^pg_toast'
  ORDER BY pg_total_relation_size(C.oid) DESC;
Optionally add LIMIT 5 or whatever.
 
=== Indexing ===
My discoveries:
* use single arrow not double on root of any ::bigint or other :: conversion!  to avoid double-convert
* use :: on any where/order/join field!
* Always carefully precisely match index fields and query fields.  Single-arrow that shit! [https://dba.stackexchange.com/questions/297734/postgresql-multi-field-index-on-jsonb-data-does-full-tablescan-when-field-is-not Me being dumb in public]
Things are working pretty well.


If you installed as above, this is the command to start the server:
* NEXT: EXPLORE INDEXING further!
  su postgres -c '/usr/local/bin/pg_ctl start -l /usr/local/pgsql/log.txt -D /usr/local/pgsql/data'
  https://blog.codeship.com/unleash-the-power-of-storing-json-in-postgres/


=== NOSQL via JSON ===
=== JSONB Usage ===
GORGEOUS POSTGRES JSON QUERY! inject blobs of JSON, then you can crack open the json inside a cell!
Read [https://www.postgresql.org/docs/current/functions-json.html this] carefully!
* Add columns of type 'jsonb' (JSON binary).  
* Inject cells of any valid JSON format (big data!). 
* Then using special syntax, you can crack open the json inside a cell to query, create, update, delete:
     CREATE TABLE stocks (
     CREATE TABLE stocks (
       id integer NOT NULL,
       id integer NOT NULL,
      board_id integer NOT NULL,
       data jsonb
       data jsonb
     );
     );
     INSERT INTO stocks VALUES (1, 1, '{"symbol": "GOOG", "tags": ["hot", "trending"], "owned": true}');
     INSERT INTO stocks VALUES (1, '{"symbol": "GOOG", "tags": ["hot", "trending"], "owned": true}, "quantity": 12');


     # returns count of data that HAS the field ingredients
     # returns count of data that HAS the field ingredients
     SELECT count(*) FROM stocks WHERE data ? 'ingredients';
     SELECT count(*) FROM stocks WHERE data ? 'tags';


     # ->> will return the property as integer or text (the parsed form of the attribute).
     # ->> will return the property as integer or text (the parsed form of the attribute).
     SELECT * FROM stocks WHERE data->>'owned' = 'true';        # if data.owned == true
     SELECT * FROM stocks WHERE data->>'owned' = 'true';        # if data.owned == true
    # you may need to cooerce the type, sigh...
    SELECT * FROM stocks WHERE (data->>'quantity')::int > 5;


     # -> will return the attribute as a JSON object
     # -> will return the attribute as a JSON object
Line 55: Line 66:
         (2 rows)
         (2 rows)


    # NEXT: EXPLORE INDEXING!
==== Field(s) update ====
    https://blog.codeship.com/unleash-the-power-of-storing-json-in-postgres/
So easy with v12 syntax:
UPDATE mytable SET json = json || '{"order_id": ""}';
update prefint set json = json || '{"value":1}' where json->>'name' = 'LoggingLevel';
 
==== Field(s) delete ====
UPDATE mytable SET json = json - 'custom_order_id';
 
==== Upserts ====
Upsert example, [https://stackoverflow.com/questions/42567972/postgres-upsert-json-props from SO]:
INSERT INTO accounts(id, json) VALUES ('1', '{##somejson##}');
INSERT INTO accounts(id, json)
VALUES ('1', '{##somejson##}')
ON CONFLICT (id) DO
UPDATE SET json = '{##somejson##}'
RETURNING *;
 
==== Fix null values ====
update autotradeparametersets set json = json || '{"analysis_percent_change": 0.0}' where (json->'analysis_percent_change')::text = 'null';
 
=== Install ===
 
* You can either install the system postgres server:
mh-install-postgres
Or install the client so you can develop code that connects to a remote server:
mh-install-postgres-client
 
* The config files are here (created and stowed by the script).  You should read and update them to adjust the performance settings to match the hardware.
/var/lib/postgresql/13/main/[postgresql.conf,pg_hba.conf]
# symlinked to...
  /home/m/development/config/ubuntu/<hostname>/var/lib/postgresql/13/main/[postgresql.conf,pg_hba.conf]
 
Always change the postgres user's default password:
sudo su postgres
psql
\password postgres
(Enter new password... add it to private)
\q
 
==== Start server ====
Server should be a standard service that starts on boot.
sudo service postgresql [status|etc..]
 
Use mh-start-postgres to start postgres (if the server is not already running):
mh-start-postgres
 
Once a new server is started, verify you can connect to it via our LAN PGAdmin.
 
==== ABT postgres tools ====
* Set up databases - Note that [at setup] will take care of this for you...
mh-add-postgres-db at_live
mh-add-postgres-db at_test
 
* To open a shell:
at dpg
at ppg
 
=== Config ===
==== Configure remote access ====
postgresql.conf:
# MDM We must also LISTEN if we want to allow any remote connections.
# Also see pg_hba.conf for remote permission details.
listen_addresses = '*'
 
pg_hba.conf:
# MDM my lan connection
host all all abtdev1.bitpost.lan md5
host all all cast.bitpost.lan md5
 
==== Add a user + db ====
Use mh-add-postgres-db to add a user (aka role) and matching database.  The user only has access to that db (best practice).  This will prompt for pw.
mh-add-postgres-db [name]
# Add name+pw to private
 
==== Backup ====
 
Look at [at import] and [at export], they do the work.
 
=== Troubleshooting ===
 
==== Finding bad huge data ====
* I had bad data (huge string written to a json value) and found it by looking at the json length not the actual json (which was huge). 
select id, length(json::text) from cycles;
Then I could clean up the rogue field with:
update cycles set json = json || '{"order_id": ""}';
 
==== Updating old passwords to use latest cipher ====
Cipher recommendation changed from md5 to scram-sha-256.  Add this line to upgrade from md5 to scram:
sudo emacs -nw /etc/postgresql/12/main/postgresql.conf
    # MDM upgrade from md5 to scram                                         
    password_encryption = scram-sha-256
After upgrading and restarting postgres, old md5 passwords will start to fail.  To update an old password, just retype them:
sudo su postgres
psql
\password ##username##
 
==== Old notes that may help ====
Postgres is a NASTY BEAST with SHITTY DEFAULT CONFIGS...
 
* set up a linux user named "postgres"
# On gentoo:
sudo useradd postgres
# On ubuntu:
sudo adduser postgres
# Put the password in private
* set up a data folder where postgres will put all its data and its log
# Standardize on [/home/m/development/postgres], but always consider symlinking to a large/redundant location.
# Here, we are choosing to use a larger RAID drive.
# WARNING NTFS will not let you chown a folder (you'd have to mount the whole thing as the target owner). 
# So on viper, I had to use the tiny primary drive directly, no symlink to ~m/bigdata.  Fuck off.
mkdir /spiceflow/softraid/development/postgres
cd ~/development && ln -s /spiceflow/softraid/development/postgres postgres
sudo chown -R postgres:postgres postgres /spiceflow/softraid/development/postgres
sudo chmod g+x /spiceflow/softraid/development # and any other parent that is missing this bit, so postgres user can cd to its folder
# We must init the data folder as a postgres "database cluster"
su postgres -c '/usr/local/bin/initdb -D /home/m/development/postgres/data'
 
'''Backup'''
 
To dump a database called mydb into a SQL-script file:
$ pg_dump mydb > db.sql
To reload such a script into a (freshly created) database named newdb:
$ psql -d newdb -f db.sql
See [https://www.postgresql.org/docs/11/app-pgdump.html here] for more.
 
''' Disabling multiple installs '''
 
There may be one or more system postgres'es already installed.  That's fine, just make sure they don't start.
sudo emacs -nw /etc/postgresql/(tab-tab)
sudo emacs -nw /etc/postgresql/10/main/start.conf # change auto > manual
sudo emacs -nw /etc/postgresql/9.6/main/start.conf # change auto > manual
sudo service postgres(tab-tab to determine if there are more than one) stop
On Ubuntu, watch out for multiple installs causing the postgres port to change from the default.  If you have two previous postgres versions, version 12 will use port 5432+1+1 = 5434.  This is very confusing.  You can see what postgres versions are running on what port in Ubuntu with this:
pg_lsclusters
 
''' Configure '''
 
You need to create two config files, postgresql.conf and pg_hba.conf.
 
There is a great configuration tool [https://pgtune.leopard.in.ua/#/ here] to generate postgresql.conf.  Example result:
 
<pre>
# DB Version: 10
# OS Type: linux
# DB Type: web
# Total Memory (RAM): 16 GB
# CPUs num: 4
# Connections num: 1000
# Data Storage: hdd
 
max_connections = 1000
shared_buffers = 4GB
effective_cache_size = 12GB
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.7
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 4
effective_io_concurrency = 2
work_mem = 2097kB
min_wal_size = 1GB
max_wal_size = 2GB
max_worker_processes = 4
max_parallel_workers_per_gather = 2
max_parallel_workers = 4
</pre>
 
You must provide the pg_hba.conf file to specify access control.  Copy it from other installs.  Make sure to add any host-to-host access you need.
 
Here is the location to put the config files when installed from source with [mh-install-postgres]:
/home/m/development/postgres/data/[postgresql.conf,pg_hba.conf]

Latest revision as of 12:14, 12 September 2023

Usage

PGAdmin

PGAdmin is running on bitpost and can connect to any db on the LAN.

Make a copy of a table

CREATE TABLE new_table AS SELECT * FROM existing_table (WHERE ...);

Readable UNIX epoch

select to_timestamp((json->>'timestamp')::int) at time zone 'utc', * from stockquotes where (json->>'quote')::float < 0.0;

Yes it's that ridiculous. Yes it's really even worse than that.

Table sizes

More ridiculous postgres overengineered obfuscated NONSENSE:

SELECT nspname || '.' || relname AS "relation",
   pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
 FROM pg_class C
 LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
 WHERE nspname NOT IN ('pg_catalog', 'information_schema')
   AND C.relkind <> 'i'
   AND nspname !~ '^pg_toast'
 ORDER BY pg_total_relation_size(C.oid) DESC;

Optionally add LIMIT 5 or whatever.

Indexing

My discoveries:

  • use single arrow not double on root of any ::bigint or other :: conversion! to avoid double-convert
  • use :: on any where/order/join field!
  • Always carefully precisely match index fields and query fields. Single-arrow that shit! Me being dumb in public

Things are working pretty well.

  • NEXT: EXPLORE INDEXING further!
https://blog.codeship.com/unleash-the-power-of-storing-json-in-postgres/

JSONB Usage

Read this carefully!

  • Add columns of type 'jsonb' (JSON binary).
  • Inject cells of any valid JSON format (big data!).
  • Then using special syntax, you can crack open the json inside a cell to query, create, update, delete:
   CREATE TABLE stocks (
     id integer NOT NULL,
     data jsonb
   );
   INSERT INTO stocks VALUES (1, '{"symbol": "GOOG", "tags": ["hot", "trending"], "owned": true}, "quantity": 12');
   # returns count of data that HAS the field ingredients
   SELECT count(*) FROM stocks WHERE data ? 'tags';
   # ->> will return the property as integer or text (the parsed form of the attribute).
   SELECT * FROM stocks WHERE data->>'owned' = 'true';         # if data.owned == true
   # you may need to cooerce the type, sigh...
   SELECT * FROM stocks WHERE (data->>'quantity')::int > 5;
   # -> will return the attribute as a JSON object
   SELECT jsonb_array_elements_text(data->'tags') as tag FROM stocks WHERE id = 1;
   # returns the actual tag values, eg:
       tag
       --------------
       Improvements
       Office
       (2 rows)

Field(s) update

So easy with v12 syntax:

UPDATE mytable SET json = json || '{"order_id": ""}';
update prefint set json = json || '{"value":1}' where json->>'name' = 'LoggingLevel';

Field(s) delete

UPDATE mytable SET json = json - 'custom_order_id';

Upserts

Upsert example, from SO:

INSERT INTO accounts(id, json) VALUES ('1', '{##somejson##}'); 

INSERT INTO accounts(id, json) 
VALUES ('1', '{##somejson##}')
ON CONFLICT (id) DO 
UPDATE SET json = '{##somejson##}'
RETURNING *;

Fix null values

update autotradeparametersets set json = json || '{"analysis_percent_change": 0.0}' where (json->'analysis_percent_change')::text = 'null';

Install

  • You can either install the system postgres server:
mh-install-postgres

Or install the client so you can develop code that connects to a remote server:

mh-install-postgres-client
  • The config files are here (created and stowed by the script). You should read and update them to adjust the performance settings to match the hardware.
/var/lib/postgresql/13/main/[postgresql.conf,pg_hba.conf]
# symlinked to...
 /home/m/development/config/ubuntu/<hostname>/var/lib/postgresql/13/main/[postgresql.conf,pg_hba.conf]

Always change the postgres user's default password:

sudo su postgres
psql
\password postgres
(Enter new password... add it to private)
\q

Start server

Server should be a standard service that starts on boot.

sudo service postgresql [status|etc..]

Use mh-start-postgres to start postgres (if the server is not already running):

mh-start-postgres

Once a new server is started, verify you can connect to it via our LAN PGAdmin.

ABT postgres tools

  • Set up databases - Note that [at setup] will take care of this for you...
mh-add-postgres-db at_live
mh-add-postgres-db at_test
  • To open a shell:
at dpg
at ppg

Config

Configure remote access

postgresql.conf:

# MDM We must also LISTEN if we want to allow any remote connections.
# Also see pg_hba.conf for remote permission details.
listen_addresses = '*'

pg_hba.conf:

# MDM my lan connection
host all all abtdev1.bitpost.lan md5
host all all cast.bitpost.lan md5

Add a user + db

Use mh-add-postgres-db to add a user (aka role) and matching database. The user only has access to that db (best practice). This will prompt for pw.

mh-add-postgres-db [name]
# Add name+pw to private

Backup

Look at [at import] and [at export], they do the work.

Troubleshooting

Finding bad huge data

  • I had bad data (huge string written to a json value) and found it by looking at the json length not the actual json (which was huge).
select id, length(json::text) from cycles;

Then I could clean up the rogue field with:

update cycles set json = json || '{"order_id": ""}';

Updating old passwords to use latest cipher

Cipher recommendation changed from md5 to scram-sha-256. Add this line to upgrade from md5 to scram:

sudo emacs -nw /etc/postgresql/12/main/postgresql.conf
   # MDM upgrade from md5 to scram                                          
   password_encryption = scram-sha-256

After upgrading and restarting postgres, old md5 passwords will start to fail. To update an old password, just retype them:

sudo su postgres
psql
\password ##username##

Old notes that may help

Postgres is a NASTY BEAST with SHITTY DEFAULT CONFIGS...

  • set up a linux user named "postgres"
# On gentoo:
sudo useradd postgres
# On ubuntu:
sudo adduser postgres
# Put the password in private
  • set up a data folder where postgres will put all its data and its log
# Standardize on [/home/m/development/postgres], but always consider symlinking to a large/redundant location.
# Here, we are choosing to use a larger RAID drive.
# WARNING NTFS will not let you chown a folder (you'd have to mount the whole thing as the target owner).  
# So on viper, I had to use the tiny primary drive directly, no symlink to ~m/bigdata.  Fuck off.
mkdir /spiceflow/softraid/development/postgres
cd ~/development && ln -s /spiceflow/softraid/development/postgres postgres
sudo chown -R postgres:postgres postgres /spiceflow/softraid/development/postgres
sudo chmod g+x /spiceflow/softraid/development # and any other parent that is missing this bit, so postgres user can cd to its folder

# We must init the data folder as a postgres "database cluster"
su postgres -c '/usr/local/bin/initdb -D /home/m/development/postgres/data'

Backup

To dump a database called mydb into a SQL-script file:

$ pg_dump mydb > db.sql

To reload such a script into a (freshly created) database named newdb:

$ psql -d newdb -f db.sql

See here for more.

Disabling multiple installs

There may be one or more system postgres'es already installed. That's fine, just make sure they don't start.

sudo emacs -nw /etc/postgresql/(tab-tab)
sudo emacs -nw /etc/postgresql/10/main/start.conf # change auto > manual
sudo emacs -nw /etc/postgresql/9.6/main/start.conf # change auto > manual
sudo service postgres(tab-tab to determine if there are more than one) stop

On Ubuntu, watch out for multiple installs causing the postgres port to change from the default. If you have two previous postgres versions, version 12 will use port 5432+1+1 = 5434. This is very confusing. You can see what postgres versions are running on what port in Ubuntu with this:

pg_lsclusters

Configure

You need to create two config files, postgresql.conf and pg_hba.conf.

There is a great configuration tool here to generate postgresql.conf. Example result:

# DB Version: 10
# OS Type: linux
# DB Type: web
# Total Memory (RAM): 16 GB
# CPUs num: 4
# Connections num: 1000
# Data Storage: hdd

max_connections = 1000
shared_buffers = 4GB
effective_cache_size = 12GB
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.7
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 4
effective_io_concurrency = 2
work_mem = 2097kB
min_wal_size = 1GB
max_wal_size = 2GB
max_worker_processes = 4
max_parallel_workers_per_gather = 2
max_parallel_workers = 4

You must provide the pg_hba.conf file to specify access control. Copy it from other installs. Make sure to add any host-to-host access you need.

Here is the location to put the config files when installed from source with [mh-install-postgres]:

/home/m/development/postgres/data/[postgresql.conf,pg_hba.conf]