Why Data Modeling is Now Critical to SOA Success

Why Data Modeling is Now Critical to SOA Success

By Jason Tiret, Director of Modeling and Architecture Solutions, Embarcadero


When SOA came on the scene, it promised to revolutionize how data is accessed within applications, across organizations and across the Web; basically anywhere it was needed.

Promoting the ultimate reuse of data and harnessing the rapid data growth were other promises of SOA. Rather than duplicating data from one system to another, SOA provided cleaner ways to access the data directly and reuse it. It was supposed to turn spaghetti-like webs of disparate systems with one-off, proprietary interfaces into an orchestrated access layer that could ask for data from anywhere and put data back seamlessly, while being more agile to changing business demands.

While SOA has accomplished this, it has also created some new challenges. How is this new data “source” documented? How is it governed? Who’s accountable to maintain quality and traceability to the back-end databases? At some point, the data in the SOA layer or enterprise service bus has to end up back in the database. If no standards are leveraged in the SOA infrastructure, integrating and sharing data can be problematic enough without it returning you to where you started with time and money wasted.

Data lives in more places that just databases. SOA has been invaluable in enabling its re-use and controlling data redundancy that can plague organizations. The backbone of Web services and SOA is XML and, more specifically, XML schemas (XSD). XSD development still elicits images of the “wild, wild west” where you build whatever you need with very little thought about reuse and standards. For the most part, XSDs have been created and managed by developers, not data architects. Developers typically they work on one project at a time, and typically do not think about enterprise-wide standards and ensuring data stored in one place is defined the same way as like data stored everywhere else.

As a result, not only can you have different representations of the same data in the SOA layer, but the version of the same data in the SOA layer can diverge greatly from data in source systems.

The XSD language also has different standards for how data is typed that provide a lot more freedom than database DDL. Precision and scale are optional on most data types. The maximum length is different between like data types like strings, dates and integers. Primary, foreign and check constraints are also treated differently. This can lead to drastic differences between the structure of the XSD and the back-end databases. If the source and target rules are not carried over to the XSD definition, it can cause many errors or, even worse, it can result in data loss as the payloads are messaged between systems.

One approach to the issue is to involve data architects in XSD development. The architects can leveraged their data models to create the XML structures much like they use data models to create databases. Models have been used to govern database development for years, so why not use the same modeling processes on XSDs? Obviously, it will make your life easier to employ a data modeling tool that provides some level of custom mapping between the logical/physical model and XSD. Once that is in place, it enables the architects to control the structure of the XML and reuse the same data elements and definitions for both database development and XML development.

XML, by nature, is hierarchical. Meanwhile, the popular database platforms are relational. This presents new challenges for trying to use data models as the common language, since they are also mostly relational. However, it is what most data architects are familiar with. Throw an XSD development tool at them and they will be a fish out of water. But if you give them their data modeling tool, they’ll be happy as clams. They will be able to play by their rules. They will understand the layout, the notation and they will be able to apply their knowledge very quickly.

To find a happy medium, most architects using modeling tools to develop canonical data models to represent the XSDs. Canonical in mathematical terms means “of simplest or standard form,” and that is exactly what these models are. They rest somewhere between a logical and a conceptual model. Some parts are normalized and some parts are denormalized. Most of the time, the intention is never to generate a database or DDL with a canonical model. The intention is to reuse the data elements from the database-driven logical models in the XSD-driven canonical models. This provides two things. First, you can leverage your existing investment in your data modeling tool. Second, you will save a lot of time working in a familiar environment.

One of the most important keys to success is to let technology and software do the heavy lifting. Most sophisticated data modeling tools enable you to reuse data elements and break a large model into smaller submodels or subject areas. This is critical for aligning the canonical models with existing standards, as well as parts of the canonical model with the messages that are passed between systems. Even better, data modeling tools will help you to selectively generate custom XSD code directly from the logical or physical.


4 responses to “Why Data Modeling is Now Critical to SOA Success

  1. I’m in total agreement, Jason, almost. We can generate XSDs from a relational model, but we also need to be able to:
    * modify enumerations, optionality, cardinality for some schemas
    * insert additional grouping elements
    * roll-up elements (e.g. subtypes into super-types)
    * combine attributes from a chain of entities into an element and/or sequence

    * combine types from multiple namespaces (and therefore multiple relational models)

    * provide documentation of the content of each schema (perhaps via HTMl, or a repository browser)
    * provide a model of each schema for approval, review, documentation
    * provide impact analysis between the schema(s) and the model(s) from which they’re derived, allowing us to know what we have, and how it differs in each place we have it

    We can do the first four of these by generating schemas and saving the settings we used; we can then re-generate the schema in an identical fashion. We may also be able to support multiple namespaces in this way. However, we can’t do the rest unless we have dedicated XML models of the schemas. Some tools achieve this via a UML profile, others have dedicated XML models. Others have dedicated repositories. Until we can show developers and integration architects that we are actually in control of the schema models, they will always regard the actual XSD files as the master definitions, which is not what we (or I) would prefer.

    Do you have plans for XML-specific models?


  2. It would also be useful if the tool came pre-loaded with models of common integration standards, such as OAGIS. That would be a great selling point and time saver.

  3. Pingback: Data Modelling is Critical to SOA « George is a Metadata Junkie, he can't help it!

  4. As always, thanks for your feedback, George. We are investigating support for native XSD/XML models, although, we don’t currently have a timeframe. Regarding the OAGIS and other industry standards, that’s something we can definitely look into as well.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s