Thursday, April 19, 2012

A few hacks to simulate mysqldump --ignore-database

A few days ago, Ronald Bradford asked for a mysqldump –ignore-database option.

As a workaround, he proposes:
mysqldump --databases `mysql --skip-column-names \
   -e "SELECT GROUP_CONCAT(schema_name SEPARATOR ' ') \
   FROM information_schema.schemata WHERE schema_name \
   NOT IN ('mysql','performance_schema','information_schema');" \
   >` >/mysql/backup/rds2.sql

It's a clever solution, but unfortunately it only works if you have a handful of schemas. If your databases happens to have several dozens (or hundreds or thousands) of schemas (which is where you need this option more), then the output will be truncated to the length of group_concat_max_len (by default, 1024.)

There are two alternative methods.

The all-shell method

This method lets shell commands filter the wanted databases. In its simplest way, it goes
DATABASE_LIST=$(mysql -NBe 'show schemas' | grep -wv 'mysql\|personnel\|buildings')
mysqldump --all-databases $DATABASE_LIST

Notice that, when you use --all-databases, information_schema and performance_schema are filtered off by default.

This method works, because the default length of the command line in Unix is much longer than group_concat_max_len:

$ getconf ARG_MAX
131072  # Linux

$ getconf ARG_MAX
262144  # Mac OSX

(in Windows it's much shorter: 8191, but since I haven't used Windows for ages, I don't really care).

A more elaborate method would require a scripting wrapper around the above instructions, but I think that as it is, it's simple enough to be remembered.

The options file method

If the length of all your database names combined is more than the maximum allowed by the shell, and all you want is filtering a few databases off a huge list, there is still hope.

Let's assume that you want to ignore mysql, personnel, and buildings from your backup.

echo '[mysqldump]' > mydump.cnf
mysql -NBe "select concat('ignore-table=', table_schema, '.', table_name) \
  from information_schema.tables \
  where table_schema in ('mysql', 'personnel', 'buildings')" \
  >> mydump.cnf

Now the options file looks like this:

[mysqldump]
ignore-table=mysql.db
ignore-table=mysql.host
ignore-table=mysql.user
[...]
What we need to do is tell mysqldump to get its information from this options file, and it will duly skip all the tables that are listed in there.
mysqldump --defaults-file=./mydump.cnf  -u $DBUSER -p$DBPWD --all-databases
There are two drawbacks with this approach:
  • There will be a DROP DATABASE IF EXISTS and CREATE DATABASE for each of the excluded schemas, although no tables will be dumped.
  • This method only works with --default-options-file. Theoretically, it should also work with --default-extra-file, but it doesn't. Therefore, if you are relying on an options file for connection parameters, they should be added to this file or listed in the command line.
Summing up, there is no perfect solution, but there are enough hacks available that you probably would find what suits you.

10 comments:

Cédric said...

"exclude-pattern" option is also available in ZRM community for ignore some databases

Giuseppe Maxia said...

@Cédric,
ZRM is hardly an ubiquitous tool. The original question was about using mysqldump (which is available everywhere, including in many cloud-based services).

Cédric said...

Sure, I just wanted to inform about this alternative.
Sometimes, I regret that ZRM Community is not used more.
But, look at this tweet from Zmanda during the last perconalive :
"It was great to see many of the Zmanda Recovery Manager for MySQL customers (and users of the community edition) at #PerconaLive. Thank You!"

Maciek said...

Giuseppe,

However group_concat_max_len can be easily tuned for the schema reading query. For example:

mysql --skip-column-names \
-e "SET group_concat_max_len=10000000; SELECT GROUP_CONCAT(schema_name SEPARATOR ' ') ..."

Maciek

Anonymous said...

Maciek,
Keep in mind group_concat_max_len is also bounded by max_allowed_packet, which must also be set

Anonymous said...

Great tip on using backticks and sql not in for concat schemas. Thanks!

Tomasz said...

I had to change:

mysqldump --all-databases $DATABASE_LIST
to
mysqldump --databases $DATABASE_LIST

to get it to work, but otherwise thanks for the great tip!

Anonymous said...

I love the single line solution that was posted here: http://stackoverflow.com/a/11007945/955858

The command is:

echo 'show databases;' | mysql -uroot -proot | grep -v ^Database$ | grep -v ^information_schema$ | grep -v ^mysql$ | grep -v -F db1 | xargs mysqldump -uroot -proot --databases > all.sql

(Ignore information_schema, mysql, and db1).

Anonymous said...

As Tomasz said, it shouldn't be --all-databases but instead just --databases. Also I needed to use a root login, so it ended up being:

DATABASE_LIST=$(mysql -uroot -p -NBe 'show schemas' | grep -wv 'mysql\|peformance_schema\|information_schema')
mysqldump --databases $DATABASE_LIST > ~/all-databases.sql

Unknown said...

Hello Giuseppe, I did try this solution
(mysqldump --all-databases $DATABASE_LIST),

and I think we should fix that, because --all-databases it's already included all databases with filter and we will get in this case into our dump.sql file something like that:

Usage: mysqldump [OPTIONS] database [tables]
OR mysqldump [OPTIONS] --databases [OPTIONS] DB1 [DB2 DB3...]
OR mysqldump [OPTIONS] --all-databases [OPTIONS]
For more options, use mysqldump --help

if we gonna use $DATABASE_LIST, we should just use:
mysqldump —databases $DATABASE_LIST