Lately I have been working with a set of customers on a longer term basis which has given me time to explore new tools using their environments. One tool that I am finding very helpful is called SchemaSpy.
SchemaSpy is a Java-based tool (requires Java 5 or higher) that analyzes the metadata of a schema in a database and generates a visual representation of it in a browser-displayable format. It lets you click through the hierarchy of database tables via child and parent table relationships as represented by both HTML links and entity-relationship diagrams. It’s also designed to help resolve the obtuse errors that a database sometimes gives related to failures due to constraints.
One of the things that I like about ERD tools is that I can quickly generate a graphic representation of the database to see which tables are referenced the most, and which are candidates for dropping (why keep old data if you don’t need to?). I get up to speed a lot faster and can contribute to the Development process a lot more effectively when I know the relationships between tables versus observing queries only. An additional benefit is that SchemaSpy is command-line driven and builds the html after each run, so I find it convenient to set it up on crontab so that schema changes are automatically picked up. Nice, eh?
Usage of this tool is very straightforward and it does a lot of the work for you. Basically if your database has Foreign Keys then you’re laughing, if not don’t despair, you just have a bit more work ahead of you. Most of the time SchemaSpy “does the right thing” but sometimes you need to give it a little help in the form of metadata files.
While not explicitly required, I prefer to create this once so that the command line is shorter and neater. In my case here is the properties file in use for my Percona Server 5.6.10 sandbox:
|
1 2 3 4 |
description=MySQL driver=com.mysql.jdbc.Driver connectionSpec=jdbc:mysql://127.0.0.1:5610/schemaspy driverPath=/usr/share/java/mysql-connector-java.jar |
|
1 2 3 4 |
CREATE TABLE `parent` ( `parent_id` int(10) unsigned NOT NULL AUTO_INCREMENT, PRIMARY KEY (`parent_id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
|
1 2 3 4 5 |
CREATE TABLE `child_A` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `parent_id` int(10) unsigned DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
|
1 2 3 4 5 6 7 |
CREATE TABLE `child_B` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `parent_id` int(10) unsigned DEFAULT NULL, PRIMARY KEY (`id`), KEY `parent_id` (`parent_id`), CONSTRAINT `child_B_ibfk_1` FOREIGN KEY (`parent_id`) REFERENCES `parent` (`parent_id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
Notice that in the case of child_A there is an implicit FK relationship to parent via parent_id, however in child_B it is explicit. This has an impact on how SchemaSpy identifies the relationship, and whether you need to provide a hint or not.
When you run schemaSpy without a metadata file, you will see that the relationship between parent and child_B is properly displayed:
Table child_A can be made to display if you tag the Implied relationships box, but this generally only works if you have used the exact same name (in this case, parent_id) in the child and parent tables. Your environment may be like a lot of shops that we see, where the parent table is known as id, and on the child tables they reference it as parent_id — this might make sense to Developers but it doesn’t help SchemaSpy at all. This is where Metadata files come in to play

The format of the metadata file is XML-based. I have included the metadata file I used to properly link child_A to parent table:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
<?xml version="1.0" encoding="UTF-8"?> <schemaMeta xmlns_xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://schemaspy.sourceforge.net/xmlschema/2011/02/05/schemaspy.meta.xsd"> <comments> Main Production Database. Percona, Michael Coburn </comments> <tables> <table name="parent" comments="Parent table"> <column name="parent_id" primaryKey="true"> </column> </table> <table name="child_A" comments="Non-FK relationship, implicit relationship to parent"> <column name="id" primaryKey="true"> </column> <column name="parent_id"> <foreignKey table="parent" column="parent_id"/> </column> </table> </tables> </schemaMeta> |
If you’re looking for other examples you can find a more comprehensive example here.
The syntax below if specific for MySQL, but note that SchemaSpy works for pretty much all the popular RDBMS out there.
|
1 |
java -jar schemaSpy_5.0.0.jar -t mysql -u msandbox_ro -p msandbox -meta schemaspy.meta.xml -o /var/www/schemaspy/ |
The -o directive tells SchemaSpy where to write the output to, make sure this is a directory that can then be accessed by a web server. Also note that since I created the mysql.properties file in the same directory as the jar file, I don’t need to pass any host:port or schema name information.
A completed Relationships view of these three tables should look like this:
While I find the Relationships tab the most useful component of SchemaSpy, I don’t want to leave you with the impression that this is the only component of the tool. There are additional tabs:
Finally don’t forget to automate SchemaSpy via crontab once you’re done.
What are some ERD tools you use and how do they compare with SchemaSpy? Feel free to answer via the comments. Thanks for reading!