= databases == mysql \g SHOW DATABASES; \r phplist; SHOW TABLES; == postgresql apt install postgresql-client sudo -u postgres psql -d knuth_db -U postgres -W -f ./init_db.sql sudo -u postgres psql -d knuth_db -U postgres -W psql -U -h -p -w ... no-password -W ... force password prompt \connect # connect to a database \? # help for \ commands \d # describe name \du # list roles \dn # list schemas \l # list databases \dt # list tables \d # list tables, views, and sequences === postgresql via Hans-Jürgen Schönig (GLT 2018) ==== moving average (with 5 rows window) time=# SELECT year, production, consumption, avg(production) OVER (ORDER BY year ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) FROM t_oil WHERE country = 'USA' ; ==== get decade (1950, 1960, 1970, ...) of year (1954, 1961, 1977, ...) test=# SELECT *, round(year - 5, -1) as decade FROM t_oil WHERE country = 'USA'; ==== lag to push one column down by 1 row SELECT *, t - lag(t, 1) OVER (ORDER BY t) AS diff FROM t_series; === Clustering clustering Primary node will read/write with application Standby nodes will replicate data of Primary node off-site backups, monitoring, alerting, testing, training, documentation ... is required availability per year: 90% ... 36.35 days down 99% ... 3.65 days 99.9% ... 8.8h 99.99% ... 52.6min 99.999% ... 5.3min auto failover := standby node becomes Primary node and takes over work if Primary node fails RTO := recovery time objective == amount of time acceptable to take to recover entire database RPO := recovery point objective == amount of data acceptable to lose in case of a crash Shared-Everything architecture := all nodes have the same roles, read/write operations can go to any node !! requires load balancing for read/write !! conflict resolution must be done at application layer synchronous replication := acknowledgement of replica is waited for before commit asynchronous replication := acknowledgement of replica is not waited for before commit; scales better; data may be inconsistent Split Brain := more than one node thinks it is a Primary Node or because of lack of connectivity all nodes function independently !! monitoring and alerting required ++ concepts: {cluster manager, quorum-based decision making, witness server to resolve a tie} 1. disable application traffic 2. isolate nodes 3. identify most up-to-date node (logs are crucial: transaction IDs, timestamps, ...) 4. restore data consistency (older nodes shoud get data from most up-to-date one) 5. assign most up-to-date node as Primary 6. verify cluster integrity and working replication 7. enable application traffic 8. post mortem analysis Tools helping to manager PostgreSQL clusters: {repmgr, pgpool-II, Patroni}