Friday, 10 June 2011

XML and relational databases

A problem I seem to keep to getting asked for help with is how to persist XML data to a relational database. (I’ve now been asked to help on at least four different projects where this has become something of an issue/blocker although strangely I’ve never actually needed to do this myself)

The context of this is usually services – I’ve seen this in REST and SOAP based services – not a debate I’m going to enter into here.

Note that this post is concerned with traditional relational databases and does not cover using the XRX (XForms, REST, Xquery) model or XML databases such as MarkLogic or eXist.

The first thing I recommend doing is to ensure that you have an XML schema. I think that it’s good practice and makes you think about the design of your XML structures. If you’re more familiar with designing databases or programmatic data structures then this will help you think about your XML in the same way and save you problems later on. It’s all too easy to end up with the same element name used for different things or common data structures with slightly different names if you just use well-formed XML.

What’s the best way to do this? Well as usual it depends…

Java driven approach


Here the starting point is your java beans.
You can use annotations for both JPA, to define the object relational mapping, and JAXB to define the object XML mapping.
http://www.oracle.com/technetwork/articles/marx-jse6-090753.html

XML driven approach



Here the XML is the primary component. Information about the structure of the data is maintained in the schema together with the JPA annotations.
Using the schema annotations it is possible to customize the way that the java beans are generated.
This is a nice approach because all the information about the structure of the data is held in the same place and everything can be easily regenerated and you don’t have to worry about losing any changes made to the generated classes.
You will, of course, need to ensure that your database schema stays in sync with the JPA definitions.
Option #1 - Hyperjaxb3

http://confluence.highsource.org/display/HJ3/Home

Option #2 - Use Dali to map your POJOs to Database (JPA)

The Eclipse Dali tool provides tooling to easily map your POJOs to a relational database using JPA:

http://www.eclipse.org/webtools/dali/

Option #3 - Use EclipseLink

EclipseLink provides both JPA and JAXB implementations. The JAXB implementation (MOXy) contains extensions specifically for handling JPA entities:

http://wiki.eclipse.org/EclipseLink/Examples/MOXy/JPA
Other links
http://www.slideshare.net/shaunmsmith/restful-services-with-jaxb-and-jpa
http://blog.hma-info.de/2008/05/15/hyperjaxb-3-the-fastest-way-from-xml-to-db/

http://blogs.bytecode.com.au/glen/2010/07/29/from-wsdl-to-jaxb-to-jpa-with-a-single-schema--adventures-in-hyperjaxb3.html

XML driven approach - a variation



An alternative approach to annotating the XML schema is to use aspect orientated programming and place the JPA annotations in the AspectJ files (a bit like how Spring Roo works)

Database driven approach



In this approach the database is the most important component in the system – this is a good approach if you’ve got an existing, stable database and want to be able to quickly add XML capabilities e.g. to provide a service based interface

Create JPA beans



There are a number of ways to do this – you can of course write the beans by hand but it’s easier to generate them. You can do this using Spring Roo (see earlier blog post) or Eclipse Dali

Mapping JPA to XML



The obvious way to map between XML and Java beans is to use JAXB.
One approach is to generate the JAXB beans and write a custom mapping to the JPA beans however there is a better way.
If you use the MOXy JAXB implementation then there are some extensions which you can use :
• XPath Based Mapping
• JPA Entities to XML - Bidirectional Relationships
These extensions allow you to use annotations on your JPA beans to describe your JAXB mappings.
N.B. If you regenerate your JPA beans then these annotations will be lost – a major disadvantage to this approach

A detailed example of how to do this is at:
http://bdoughan.blogspot.com/2010/08/creating-restful-web-service-part-15.html


The common approach



I’ve called in this because, in my experience, this is what most java coders do. In fact this covers several different approaches but at heart they are very similar.

Parse the XML into java beans



Lots of ways to do this – JAXB, using DOM or SAX by hand, if you’ve a well written WSDL file for a SOAP service then this will be part of the generated code.

Persist the java beans


Again lots of ways to do this – hand written JDBC calls, Hibernate…

The simple approach



This is a simple approach which might be sufficient if performance is not too much of an issue.
This could be a good approach if you’ve got some relatively complex XML, which you want to store, but are not interested in much of the actual content of the XML.
It is easy enough to store XML documents as BLOB and all the signification databases now support XPATH querying of the data (Check your database for specifics of how to do this e.g. MySQL at http://dev.mysql.com/tech-resources/articles/xml-in-mysql5.1-6.0.html#xml-5.1-xpath)
If you are using this approach then it’s probably a good idea to use stored procedures so that if you need to migrate to a more structured data model then this can be accomplished with minor application changes.
This approach can be enhanced by extracting specific elements into structured columns so that if there is a particular element that you want to query you can do it in the traditional manner.

ETL Tools



If you’ve got the tools and the experience in using them then ETL tools can provide a graphically driven interface to produce the mapping between the XML and the database schema. Talend, Oracle Data Integrator, IBM Data Stage, Informatica etc all provide this and can expose service endpoints for interactions. If you’ve got the tools and expertise in using them then it’s worth considering this approach.

I think this post is probably long enough for now…