September 30, 2014

Implementing SchemaSpy in your MySQL environment

SchemaSpyLately I have been working with a set of customers on a longer term basis which has given me time to explore new tools using their environments.  One tool that I am finding very helpful is called SchemaSpy.

SchemaSpy is a Java-based tool (requires Java 5 or higher) that analyzes the metadata of a schema in a database and generates a visual representation of it in a browser-displayable format. It lets you click through the hierarchy of database tables via child and parent table relationships as represented by both HTML links and entity-relationship diagrams. It’s also designed to help resolve the obtuse errors that a database sometimes gives related to failures due to constraints.

One of the things that I like about  ERD tools is that I can quickly generate a graphic representation of the database to see which tables are referenced the most, and which are candidates for dropping (why keep old data if you don’t need to?).  I get up to speed a lot faster and can contribute to the Development process a lot more effectively when I know the relationships between tables versus observing queries only.  An additional benefit is that SchemaSpy is command-line driven and builds the html after each run, so I find it convenient to set it up on crontab so that schema changes are automatically picked up.  Nice, eh?

Usage of this tool is very straightforward and it does a lot of the work for you.  Basically if your database has Foreign Keys then you’re laughing, if not don’t despair, you just have a bit more work ahead of you.  Most of the time SchemaSpy “does the right thing” but sometimes you need to give it a little help in the form of metadata files.

Installation of SchemaSpy and Dependencies

Creating a mysql.properties file

While not explicitly required, I prefer to create this once so that the  command line is shorter and neater.  In my case here is the properties file in use for my Percona Server 5.6.10 sandbox:

Example Schema

Notice that in the case of child_A there is an implicit FK relationship to parent via parent_id, however in child_B it is explicit.  This has an impact on how SchemaSpy identifies the relationship, and whether you need to provide a hint or not.

Running SchemaSpy (with Foreign Keys)

When you run schemaSpy without a metadata file, you will see that the relationship between parent and child_B is properly displayed:

FK_only

Table child_A can be made to display if you tag the Implied relationships box, but this generally only works if you have used the exact same name (in this case, parent_id) in the child and parent tables.  Your environment may be like a lot of shops that we see, where the parent table is known as id, and on the child tables they reference it as parent_id — this might make sense to Developers but it doesn’t help SchemaSpy at all.  This is where Metadata files come in to play

implied

Creating a Metadata File

The format of the metadata file is XML-based.  I have included the metadata file I used to properly link child_A to parent table:

If you’re looking for other examples you can find a more comprehensive example here.

Running SchemaSpy

The syntax below if specific for MySQL, but note that SchemaSpy works for pretty much all the popular RDBMS out there.

The -o directive tells SchemaSpy where to write the output to, make sure this is a directory that can then be accessed by a web server.  Also note that since I created the mysql.properties file in the same directory as the jar file, I don’t need to pass any host:port or schema name information.

Viewing SchemaSpy output

A completed Relationships view of these three tables should look like this:

after_metadata_application

Final Thoughts

While I find the Relationships tab the most useful component of SchemaSpy, I don’t want to leave you with the impression that this is the only component of the tool.  There are additional tabs:

  • Tables – Names, number of children, parents, count of columns, row counts, and comments — a great way for a high level overview of the table sizes and an easy way to use the search feature of your browser to zero in on a particular table
  • Constraints – lists the explicit Foreign Key constraints in the database (this does not include constraints identified via metadata files!)
  • Anomalies – Identifies possible relationships between columns/tables based on names, tables without indexes, columns flagged ‘nullable’ and ‘must be unique’ (woops!), single column tables, incrementing column names in tables, and tables with the string NULL instead of the actual SQL NULL value. This is basically a quick sanity check of your schema for any significant errors or items requiring review.
  • Columns – a listing of all columns in the schema, really handy to sort by name to see if you have any implicit constraints that you might have missed and can then write into your metadata file.
  • Donate – This is free software and John Currier asks for donations so he can justify the time spent maintaining SchemaSpy to his wife :)

Finally don’t forget to automate SchemaSpy via crontab once you’re done.

What are some ERD tools you use and how do they compare with SchemaSpy?  Feel free to answer via the comments.  Thanks for reading!

About Michael Coburn

Michael joined Percona as a Consultant in 2012 after having worked with high volume stock photography websites and email service provider platforms. WIth a foundation in Systems Administration, Michael enjoys working with SAN technologies and high availability solutions. A Canadian, Michael currently lives in the Raleigh, North Carolina area with his wife, two children, and two dogs.

Comments

  1. The ERD tools I use are MySQL Workbench and SchemaSpy. The anomalies tab of SchemaSpy has been quite helpful. However both tools can’t really cope with really large schemas.

  2. Michael Coburn says:

    Daniël, thank you for your feedback.

    * How do you define large schemas, and what do you mean by can’t cope?
    * Is this a concern regarding crashes of the tool, or certain features don’t work past N number of tables?

    In my experience, the schemas I have under SchemaSpy reach at max approximately 3,000 tables per schema and perhaps 10,000 tables per mysqld instance. SchemaSpy does take some time to execute but this is generally no more than a couple minutes. I haven’t tried pulling this schema into Workbench so I can’t comment as to how well it scales at this volume of tables.

  3. Joey Aldrin Cruz says:

    Good to hear that there is a tool like this, that can also integrate to MySQL. We used DBForge Studio For MySQL.
    As I did not already try to use SchemaSpy I can’t compare what is the difference between the two.
    I think the only difference for now is :

    SchemaSpy is a “FREE” software. And DBForge is not .(Although we can try to evaluate it for 30 days).
    I will try this tool and do some comparison. And to know what is the best to use?
    Although I did not try to use SchemaSpy and as I’ve read on this blog it has a limited components compare to DBforge
    But one of the best question that is on my mind now is :

    It is worth to pay for Express, Standard or Professional Edition of DBForge Studio?

    OR

    It is much better to use the free software like SchemaSpy?

    Well, we will see after some test regarding functionality etc;. ;)

  4. Gowtham says:

    Can Schema spy be used to detect differences in schema between two versions? Basically i would like to know if the schema has changed between builds

  5. Michael Coburn says:

    I’m not aware of this type of functionality built into SchemaSpy, for this you might need to look outside of SchemaSpy and instead look at doing diffs between mysqldump –no-data schemas.

Speak Your Mind

*