Why Data Modeling is Now Critical to SOA Success
By Jason Tiret, Director of Modeling and Architecture Solutions, Embarcadero
When SOA came on the scene, it promised to revolutionize how data is accessed within applications, across organizations and across the Web; basically anywhere it was needed.
Promoting the ultimate reuse of data and harnessing the rapid data growth were other promises of SOA. Rather than duplicating data from one system to another, SOA provided cleaner ways to access the data directly and reuse it. It was supposed to turn spaghetti-like webs of disparate systems with one-off, proprietary interfaces into an orchestrated access layer that could ask for data from anywhere and put data back seamlessly, while being more agile to changing business demands.
While SOA has accomplished this, it has also created some new challenges. How is this new data “source” documented? How is it governed? Who’s accountable to maintain quality and traceability to the back-end databases? At some point, the data in the SOA layer or enterprise service bus has to end up back in the database. If no standards are leveraged in the SOA infrastructure, integrating and sharing data can be problematic enough without it returning you to where you started with time and money wasted.
Data lives in more places that just databases. SOA has been invaluable in enabling its re-use and controlling data redundancy that can plague organizations. The backbone of Web services and SOA is XML and, more specifically, XML schemas (XSD). XSD development still elicits images of the “wild, wild west” where you build whatever you need with very little thought about reuse and standards. For the most part, XSDs have been created and managed by developers, not data architects. Developers typically they work on one project at a time, and typically do not think about enterprise-wide standards and ensuring data stored in one place is defined the same way as like data stored everywhere else.
As a result, not only can you have different representations of the same data in the SOA layer, but the version of the same data in the SOA layer can diverge greatly from data in source systems.
The XSD language also has different standards for how data is typed that provide a lot more freedom than database DDL. Precision and scale are optional on most data types. The maximum length is different between like data types like strings, dates and integers. Primary, foreign and check constraints are also treated differently. This can lead to drastic differences between the structure of the XSD and the back-end databases. If the source and target rules are not carried over to the XSD definition, it can cause many errors or, even worse, it can result in data loss as the payloads are messaged between systems.
One approach to the issue is to involve data architects in XSD development. The architects can leveraged their data models to create the XML structures much like they use data models to create databases. Models have been used to govern database development for years, so why not use the same modeling processes on XSDs? Obviously, it will make your life easier to employ a data modeling tool that provides some level of custom mapping between the logical/physical model and XSD. Once that is in place, it enables the architects to control the structure of the XML and reuse the same data elements and definitions for both database development and XML development.
XML, by nature, is hierarchical. Meanwhile, the popular database platforms are relational. This presents new challenges for trying to use data models as the common language, since they are also mostly relational. However, it is what most data architects are familiar with. Throw an XSD development tool at them and they will be a fish out of water. But if you give them their data modeling tool, they’ll be happy as clams. They will be able to play by their rules. They will understand the layout, the notation and they will be able to apply their knowledge very quickly.
To find a happy medium, most architects using modeling tools to develop canonical data models to represent the XSDs. Canonical in mathematical terms means “of simplest or standard form,” and that is exactly what these models are. They rest somewhere between a logical and a conceptual model. Some parts are normalized and some parts are denormalized. Most of the time, the intention is never to generate a database or DDL with a canonical model. The intention is to reuse the data elements from the database-driven logical models in the XSD-driven canonical models. This provides two things. First, you can leverage your existing investment in your data modeling tool. Second, you will save a lot of time working in a familiar environment.
One of the most important keys to success is to let technology and software do the heavy lifting. Most sophisticated data modeling tools enable you to reuse data elements and break a large model into smaller submodels or subject areas. This is critical for aligning the canonical models with existing standards, as well as parts of the canonical model with the messages that are passed between systems. Even better, data modeling tools will help you to selectively generate custom XSD code directly from the logical or physical.