Strategies for Integrating Legacy Systems with a Common Business Object Model Implementation
Country Companies Insurance
1711 G.E. Road, Bloomington, Illinois 61702-2020
Ph: 309-821-3294 FAX: 309-821-2501
This paper will describe some strategies for integrating legacy systems to a generalized EJB component implementation that is based on a common business object model (BOM). The BOM can be an enterprise-wide company model, an industry specific model (in our case IBM’s Insurance Application Architecture ) or a generalized model such as those stemming from OMG’s BODTF. I will first describe the particular problem I am targeting. Then I will describe several types of legacy systems that I desire to integrate with. Finally, I will present some generalized integration patterns that can be used.
This paper is targeted at a particular problem. A summary of the problem is as follows. I have a generalized business object model (BOM) that will accommodate most of the legacy data that I need access to. I can extend the business object model if needed. The BOM is highly reusable and is generalized to apply to the entire enterprise.
I plan to implement the BOM as EJBs. I would like to use the BOM EJBs in a number of different applications. Many of these applications require data from legacy systems, but the legacy integration is generally application specific. I would therefore like to externalize the legacy interaction code from BOM EJBs. I would like to copy data out of a variety of legacy systems, work on that data within the BOM for a period of time, and then update the legacy systems with the final results. The fact that the BOM obtains data from and updates legacy systems should be transparent to the business clients of the BOM. How might I write integration modules that couple the BOM to various legacy systems.
Legacy systems can be classified in many ways. For the purposes of this paper, communication timing and visibility & concurrency structures are most important.
· A communication where you do not wait for a response. If a response is needed, you will either check back later to see if one is available or there will be an event mechanism where you will be notified when a response is available.
· A communication where you wait for a response.
Visibility means the time at which data can be seen/read by the enterprise (ex. prior to publishing, a novel is only visible to the author and the editor. After publishing, it is visible to the general public). Concurrency means the manner by which multiple people are allowed to simultaneously access data.
· Legacy integration with the BOM where the legacy data is read but not updated by the BOM. The legacy data is either static or the BOM is not concerned with how current the data is or whether it might have changed since the last read.
· A legacy structure often found in batch systems or in systems where workflow managers are involved. Data relative to in-progress transactions is kept in a separate file (pending file) from the file (master file) that houses data from completed transactions. Data in the pending file has a very limited visibility. Data in the master file is visible to the enterprise and is the “official” data at any point in time.
· All data is stored in the same file or database. The structure of the data is such that it is unlikely that two people would be updating the same data at the same point in time. Segmented data legacy systems are generally non-normalized and have a lot of redundant data. All data is visible to the enterprise, but the data structure is such that you should not need to look outside of your segment.
· The legacy system is not structure for concurrent update and has few if any controls to avoid multi-user updates stomping on each other. To avoid concurrent update problems, the business has devised manual procedures that control the update process to minimized or eliminate conflicts (access is serialized). This is often done by physically passing around some token (often a file folder) where you are not allowed to work on data unless you have physical possession of the token. All data is visible to the enterprise, but the intent is that data within a particular scope only be view by those holding the token for that scope.
· Often found in CICS or IMS systems. The scope of a transaction is controlled by the designer of the access module. Each invocation of the access module represents a transaction boundary. Data is visible to the enterprise after each invocation.
· Generally found in client/server systems based on relational database. The client can begin and end a transaction. Multiple accesses can take place within the scope of the single transaction. Data is visible only to the client while the transaction is in progress. Data becomes visible to the enterprise upon transaction commit.
For this paper, push and pull strategies are from the point of view of initialization of the BOM. If you are populating the BOM before a business client needs the data, you are pushing. If the populating of the BOM is triggered by a business client requesting data, then you are pulling.
Push style integration components act as clients to the BOM. They generally assume that someone knows exactly what data a business client to the BOM needs and when they need it. Data usually moves around in very large chunks. The clients themselves have no persistent state data. They contain function only.
· Can be manually triggered where a human supplies parameters as to which legacy data to retrieve. Alternatively, can be scheduled as a batch job that examines newly arrived data in legacy systems. Granularity is usually large (ex. get all of the data associated with a life insurance application and populate the BOM with it). Attempts to read all of the data necessary for the business clients of the BOM to perform their function. Typically populates the BOM using the same API (application programming interface) as the business clients. Controls the transactional scope of the BOM interaction. If synchronous legacy interaction fails, can rollback the BOM transaction.
· Can be manually triggered, but is usually scheduled as a batch job. Examines data in both the legacy system and the BOM, compares values, and decides whether either side needs to be updated. Amount of data read can be large. Amount of data updated is usually small. Granularity will vary depending on whether you have one client or several that do specific pieces. Controls the transactional scope of the BOM interaction. If synchronous read legacy interaction fails, can rollback the BOM transaction. Rollback of the legacy system will depend on legacy transactional capabilities.
· Can be manually triggered, but is usually scheduled as a batch job. Usually examines BOM data to determine if a particular status (ex. complete, cancelled, etc.) has been reached. Will update the legacy system and then remove data from BOM. Needs to be intelligent enough not to remove shared data. Updates to legacy and deletes of BOM will usually be done in a pseudo two-phase commit. Deletes will not take place unless legacy updates have been successful. Most likely to fail legacy updates will be done first so that legacy is not left half updated. If legacy update does fail, either compensating transactions can be used or humans can be contacted to fix it (usually through an exception log). If any of the legacy updates are asynchronous, a response client will likely be involved (see below). Delete of BOM data will be pending until all asynchronous responses have been resolved.
· Either a constantly running task or scheduled as a batch job. Collects responses from asynchronous interactions with legacy systems, examines them, and then decides what to do (ex. read the transaction log tape from the nightly legacy batch update, determine whether any of the update from BOM transactions failed, cancel the delete of BOM data if they did, and make an entry on the exception log so a human can fix the situation).
· Basically the same as a Response Polling Client. Used in situations where a system event can be received. Most often, this is the receipt of a message into a message queue. The message queue software would start up the Response Event Client.
Pull style integration components are buried at the lowest levels of the server. They take the view of lazy population of the BOM. Data usually moves in smaller chunks. Components may maintain persistent state data if there is a need for point in time snapshots of legacy data so that concurrent update issues can be detected and dealt with. For example, the pull component may keep a copy of legacy data at the point in time it was copied to the BOM. When it is time (possibly weeks later) to update back to the legacy system, it will see if other users have changed legacy data by comparing the data snapshot with the current legacy data. If there is a discrepancy, it decide what to do about it.
· For each BOM class, there is a matching legacy adapter class. The class can access one or more legacy systems. Pulls data from legacy an object at a time. Works well if BOM and legacy structures are very close. May be patterned and partially generateable Can be slow because of the small granularity of interaction with the legacy system.
· For each legacy structure that you are wrapping, there is a matching adapter class. This works well for keeping snapshot copies of legacy data. May be patterned and partially generateable Can be slow because of the small granularity of interaction with the legacy system.
· Adapter is structured around the business function your are trying to perform. Several BOM classes would potentially trigger it. Once triggered, it would attempt to get all (or a consistent subset) of the data expected to be needed. Can be efficient because of the large grained interaction with the legacy systems.
· Similar to the adapter above, but structured around patterns where, if a particular object is touched, you can anticipate that other nearby objects will also be needed. Can be efficient because of the large grained interaction with the legacy systems. May be more maintainable because access patterns change at a slower rate than application requirements.
· The BOM is intended to contain a small subset of the total data in the legacy systems. If a find request such as “give me all of the people named Smith” were to require instantiation of 5000 Smiths in BOM, this could pose a problem. This is especially true if the business client were to select one of them and then proceed to ignore the other 4999 (which is normally the situation). This adapter passes data from the legacy system through to the BOM business client without instantiating objects within the BOM itself. When the business client has selected which object it wants to work with, a single item “find” will be used with one of the other adapter types to do the instantiation in the BOM.
· If the BOM wishes to implement a long-running business transaction, a UOW object can be used to keep track of all objects that have participated in the scope of the transaction. When the transaction is to be committed, a Legacy Update Coordination Object (see below) can move data from all of the participants back to the legacy system.
· Triggered by the commit of a UOW, by explicit call from a business client, or by a scheduled batch job. Figures out which objects in the BOM should be used to update the legacy systems. If a UOW object is used, this is simply the list of participants. If no UOW is used, will likely examine some status field and then select other objects based on business requirements. Will implement a pseudo two-phase commit with legacy as discussed under “Update Client” above. May interact with a Response Polling Client or a Response Event Client (see above).
· Triggered by an activity in the BOM (such as specific data changes) or by a scheduled batch job. Performs the same function as the “Synchronization Client” above. The difference is that data exchange takes place at the lowest levels of the server, possible using APIs that are specific to its function. Legacy snapshot data may also have been kept so concurrent update can be dealt with.