Migrating data with Liquibase

Recently, we started integrating Liquibase as a database schema migration tool into most of my team’s projects, for both new from-scratch projects and already existing ones. Liquibase is great because it allows us to use an SCM tool like Git to manage different revisions of an applications database schema – or more specifically, the changes required to migrate the database schema from one revision to another.
While migrating database schemas seems like a pretty straight-forward task at the beginning, things get more complicated as soon as you want to roll back schema changes without dropping your database (and then rebuilding it). Liquibase also supports migrating your data across schema changes, in both directions. But lets start with the basics.
For this example, I only used the Liquibase command line interface, along with the basic MySQL command line client. Of course, Liquibase also integrates nicely with Maven (as a Maven goal) or Spring (as a bean that executes during context initialization).
I start with a very basic table called „Person“, consisting only of an ID (primary key) and a name:

mysql> describe Person;
+-------+--------------+------+-----+---------+----------------+
| Field | Type         | Null | Key | Default | Extra          |
+-------+--------------+------+-----+---------+----------------+
| id    | bigint(20)   | NO   | PRI | NULL    | auto_increment |
| name  | varchar(255) | NO   | UNI | NULL    |                |
+-------+--------------+------+-----+---------+----------------+
2 rows in set (0.00 sec)

Liquibase uses so-called changesets, which are XML-snippets used to describe DDL statements. They are organized in change log files. The following change set is used to create a table (via the „createTable“-tag) and two columns (via the „column“-tag)


  
    
      
        
        
      
      
        
        
      
    
  

When I run Liquibase via command line, it sets up the „Person“ table. The relevant command is „update“:

./liquibase --url=jdbc:mysql://localhost:3306/liquiblog --driver=com.mysql.jdbc.Driver --username=root --password="" --changeLogFile=db.changelog-0.1.0.xml update

Liquibase already knows how to roll back certain changesets, like the „createTable“ changeset above. If we call the command line client with „rollbackCount 1“ instead of „update“, it rolls back the last changeset it executed, and the „Person“ table is gone.
Other changesets cannot be rolled back automatically. Consider the following „insert“-changeset that inserts an entry into our „Person“ table:


  
    
      
      
    
    
      DELETE FROM Person WHERE name LIKE 'John Doe';
    
  

I manually added a „rollback“-tag containg an SQL statement that reverses the changset. Note that the „rollback“-tag can contain either SQL statements as text or certain Liquibase refactoring tags. Since we now have two change log xml files, I created a „master“-file that imports the other files in the order in which they should be executed:



	
	

If we run the „update“ command with the master change log file, it checks wether the first changeset was already executed (depending on you rolled it back or not) and then executes the second changeset that adds a „Person“ entry. To make this work, Liquibase create a helper table called „DATABASECHANGELOGS“ containg already-executed change sets along with a hash value (to make sure no-one modifies changesets once they have been executed):

mysql> select id, md5sum, description from DATABASECHANGELOG;
+--------+------------------------------------+--------------+
| id     | md5sum                             | description  |
+--------+------------------------------------+--------------+
| 1      | 3:5a36f447e90b35c3802cb6fe16cb12a7 | Create Table |
| init-1 | 3:43c29e0011ebfcfd9cfbbb8450179a41 | Insert Row   |
+--------+------------------------------------+--------------+
2 rows in set (0.00 sec)

Now that we got the basics running, lets try something more challenging: an actual change to our schema that requires both schema and data migration. Our „Person“ table currently has only a name column, and we decided that we want to split it up into a „firstname“ and a „lastname“ column.
Before beginning work, I have Liquibase „tag“ the database so that we can roll back to this tag later on:

./liquibase --url=jdbc:mysql://localhost:3306/liquiblog --driver=com.mysql.jdbc.Driver --username=root --password="" --changeLogFile=changelog-master.xml tag liquiblog_0_1_0

I created a new change set that adds the two new columns:


  
    
      
      
    
    
      
      
    
  

Once again, Liquibase knows how to roll back this change set, so we can skip the rollback tag.
Now that the table has two additional columns, we must take care of migrating our existing data to the new schema before deleting the old, now obsolete „name“ column. Since data manipulation is not supported out-of-the-box by Liquibase, we have to use its „sql“ tag to include native SQL statements within a changeset.


  
    UPDATE Person SET firstname = SUBSTRING_INDEX(name, ' ', 1);
    UPDATE Person SET lastname = SUBSTRING_INDEX(name, ' ', -1);
  
  
    UPDATE Person SET firstname = '';
    UPDATE Person SET lastname = '';
  

Note that the content of the „rollback“-tag is kind of redundant, but the tag itself is required because Liquibase prevents us from rolling back changesets that cannot be rolled back implicitly and have no explicit rollback tag.
Once again, after executing Liquibase with the „update“-option, the new changeset is run, and our newly-created „firstname“ and „lastname“ columns now contain data.
Finally, I want to remove the old „name“ column.


  
  
  
    
      
        
        
      
    
    
      UPDATE Person SET name = CONCAT(firstname, CONCAT(' ', lastname));
   
  

Again, the changeset itself is quite simple because Liquibase supports dropping columns, but the „rollback“-tag is more complicated: I first re-add the old „name“-column using the standart „addColumn“-tag, and then I used a custom SQL statement to set the columns value.
We end up with a new database schema, complete with data:

mysql> select * from Person;
+----+-----------+------------+
| id | firstname | lastname   |
+----+-----------+------------+
|  1 | John      | Doe        |
+----+-----------+------------+
1 rows in set (0.00 sec)

Because we created a tag earlier and included rollback instructions in all our changesets, we can always roll back these modifications without loosing any data! By running..

./liquibase --url=jdbc:mysql://localhost:3306/liquiblog --driver=com.mysql.jdbc.Driver --username=root --password="" --changeLogFile=changelog-master.xml rollback liquiblog_0_1_0

..we get our original database back!
Of course, the example with splitting / concatenating strings is a little far-fetched, but the same principles can be applied to more sophisticated refactorings. I came across the idea for this blog post when we had to split an existing domain class (mapped to a single table) into an abstract base class and two subclasses, preferrably without losing data.

Kommentare

  1. What I really like about Liquibase is the capability to create diffs from two databases. On the one hand this is a really nifty feature when it comes to upgrading a database, just to give you an better feeling of what will happen. On the other hand it allows you to do all your structural changes in the database (with SQL statements) and dump the corresponding changeset to disk afterwards.

  2. Very nice article and it's good to see that Liquibase is still being used at synyx!
    This reminds me of a workshop during Schule@synyx I held last year. Slides can be viewed on GitHub (<a href="http://joschi.github.com/liquibase-presentation/" title="
    Datenbankkonfigurations-Management mit Liquibase">
    Datenbankkonfigurations-Management mit Liquibase</a>) and there's also a <a href="https://github.com/joschi/liquibase-presentation/tree/master/demo" rel="nofollow">hands-on part</a> in it.