Strategies for Integrating Legacy Systems with a
Common Business Object Model Implementation
Kevin Rasmus
Country Companies Insurance
1711 G.E. Road, Bloomington,
Illinois 61702-2020
Ph: 309-821-3294 FAX: 309-821-2501
This paper will describe some strategies
for integrating legacy systems to a generalized EJB component implementation
that is based on a common business object model (BOM). The BOM can be an enterprise-wide company
model, an industry specific model (in our case IBM’s Insurance Application
Architecture ) or a generalized model such as those stemming from OMG’s
BODTF. I will first describe the
particular problem I am targeting. Then
I will describe several types of legacy systems that I desire to integrate
with. Finally, I will present some
generalized integration patterns that can be used.
This paper is targeted at a
particular problem. A summary of the
problem is as follows. I have a
generalized business object model (BOM) that will accommodate most of the legacy
data that I need access to. I can
extend the business object model if needed. The BOM is highly reusable and is
generalized to apply to the entire enterprise.
I plan to implement the BOM as
EJBs. I would like to use the BOM EJBs
in a number of different applications.
Many of these applications require data from legacy systems, but the
legacy integration is generally application specific. I would therefore like to externalize the legacy interaction code
from BOM EJBs. I would like to copy
data out of a variety of legacy systems, work on that data within the BOM for a
period of time, and then update the legacy systems with the final results. The fact that the BOM obtains data from and
updates legacy systems should be transparent to the business clients of the
BOM. How might I write integration
modules that couple the BOM to various legacy systems.
Legacy systems can be classified in many ways. For the purposes of this paper, communication timing and visibility & concurrency structures are most important.
·
A
communication where you do not wait for a response. If a response is needed, you will either check back later to see
if one is available or there will be an event mechanism where you will be
notified when a response is available.
·
A
communication where you wait for a response.
Visibility means the time at which
data can be seen/read by the enterprise (ex. prior to publishing, a novel is
only visible to the author and the editor.
After publishing, it is visible to the general public). Concurrency means the manner by which
multiple people are allowed to simultaneously access data.
·
Legacy
integration with the BOM where the legacy data is read but not updated by the
BOM. The legacy data is either static
or the BOM is not concerned with how current the data is or whether it might
have changed since the last read.
·
A
legacy structure often found in batch systems or in systems where workflow
managers are involved. Data relative to
in-progress transactions is kept in a separate file (pending file) from the
file (master file) that houses data from completed transactions. Data in the pending file has a very limited
visibility. Data in the master file is
visible to the enterprise and is the “official” data at any point in time.
·
All
data is stored in the same file or database.
The structure of the data is such that it is unlikely that two people
would be updating the same data at the same point in time. Segmented data legacy systems are generally
non-normalized and have a lot of redundant data. All data is visible to the enterprise, but the data structure is
such that you should not need to look outside of your segment.
·
The
legacy system is not structure for concurrent update and has few if any
controls to avoid multi-user updates stomping on each other. To avoid concurrent update problems, the
business has devised manual procedures that control the update process to
minimized or eliminate conflicts (access is serialized). This is often done by physically passing
around some token (often a file folder) where you are not allowed to work on
data unless you have physical possession of the token. All data is visible to the enterprise, but
the intent is that data within a particular scope only be view by those holding
the token for that scope.
·
Often
found in CICS or IMS systems. The scope
of a transaction is controlled by the designer of the access module. Each invocation of the access module
represents a transaction boundary. Data
is visible to the enterprise after each invocation.
·
Generally
found in client/server systems based on relational database. The client can begin and end a
transaction. Multiple accesses can take
place within the scope of the single transaction. Data is visible only to the client while the transaction is in
progress. Data becomes visible to the
enterprise upon transaction commit.
For this paper, push and pull
strategies are from the point of view of initialization of the BOM. If you are populating the BOM before a
business client needs the data, you are pushing. If the populating of the BOM is triggered by a business client requesting
data, then you are pulling.
Push style integration components
act as clients to the BOM. They
generally assume that someone knows exactly what data a business client to the
BOM needs and when they need it. Data
usually moves around in very large chunks.
The clients themselves have no persistent state data. They contain function only.
·
Can
be manually triggered where a human supplies parameters as to which legacy data
to retrieve. Alternatively, can be
scheduled as a batch job that examines newly arrived data in legacy systems. Granularity is usually large (ex. get all of
the data associated with a life insurance application and populate the BOM with
it). Attempts to read all of the data
necessary for the business clients of the BOM to perform their function. Typically populates the BOM using the same
API (application programming interface) as the business clients. Controls the transactional scope of the BOM
interaction. If synchronous legacy
interaction fails, can rollback the BOM transaction.
·
Can
be manually triggered, but is usually scheduled as a batch job. Examines data in both the legacy system and
the BOM, compares values, and decides whether either side needs to be updated. Amount of data read can be large. Amount of data updated is usually
small. Granularity will vary depending
on whether you have one client or several that do specific pieces. Controls the transactional scope of the BOM
interaction. If synchronous read legacy
interaction fails, can rollback the BOM transaction. Rollback of the legacy system will depend on legacy transactional
capabilities.
·
Can
be manually triggered, but is usually scheduled as a batch job. Usually examines BOM data to determine if a
particular status (ex. complete, cancelled, etc.) has been reached. Will update the legacy system and then
remove data from BOM. Needs to be
intelligent enough not to remove shared data.
Updates to legacy and deletes of BOM will usually be done in a pseudo
two-phase commit. Deletes will not take
place unless legacy updates have been successful. Most likely to fail legacy updates will be done first so that
legacy is not left half updated. If
legacy update does fail, either compensating transactions can be used or humans
can be contacted to fix it (usually through an exception log). If any of the legacy updates are
asynchronous, a response client will likely be involved (see below). Delete of BOM data will be pending until all
asynchronous responses have been resolved.
·
Either
a constantly running task or scheduled as a batch job. Collects responses from asynchronous
interactions with legacy systems, examines them, and then decides what to do
(ex. read the transaction log tape from the nightly legacy batch update,
determine whether any of the update from BOM transactions failed, cancel the
delete of BOM data if they did, and make an entry on the exception log so a
human can fix the situation).
·
Basically
the same as a Response Polling Client.
Used in situations where a system event can be received. Most often, this is the receipt of a message
into a message queue. The message queue
software would start up the Response Event Client.
Pull style integration components
are buried at the lowest levels of the server.
They take the view of lazy population of the BOM. Data usually moves in smaller chunks. Components may maintain persistent state
data if there is a need for point in time snapshots of legacy data so that
concurrent update issues can be detected and dealt with. For example, the pull component may keep a
copy of legacy data at the point in time it was copied to the BOM. When it is time (possibly weeks later) to
update back to the legacy system, it will see if other users have changed legacy
data by comparing the data snapshot with the current legacy data. If there is a discrepancy, it decide what to
do about it.
·
For
each BOM class, there is a matching legacy adapter class. The class can access one or more legacy
systems. Pulls data from legacy an
object at a time. Works well if BOM and
legacy structures are very close. May
be patterned and partially generateable Can be slow because of the small
granularity of interaction with the legacy system.
·
For
each legacy structure that you are wrapping, there is a matching adapter
class. This works well for keeping
snapshot copies of legacy data. May be patterned and partially generateable Can
be slow because of the small granularity of interaction with the legacy system.
·
Adapter
is structured around the business function your are trying to perform. Several BOM classes would potentially
trigger it. Once triggered, it would attempt
to get all (or a consistent subset) of the data expected to be needed. Can be efficient because of the large
grained interaction with the legacy systems.
·
Similar
to the adapter above, but structured around patterns where, if a particular
object is touched, you can anticipate that other nearby objects will also be
needed. Can be efficient because of the
large grained interaction with the legacy systems. May be more maintainable because access patterns change at a
slower rate than application requirements.
·
The
BOM is intended to contain a small subset of the total data in the legacy
systems. If a find request such as
“give me all of the people named Smith” were to require instantiation of 5000
Smiths in BOM, this could pose a problem.
This is especially true if the business client were to select one of
them and then proceed to ignore the other 4999 (which is normally the situation). This adapter passes data from the legacy
system through to the BOM business client without instantiating objects within
the BOM itself. When the business
client has selected which object it wants to work with, a single item “find”
will be used with one of the other adapter types to do the instantiation in the
BOM.
·
If
the BOM wishes to implement a long-running business transaction, a UOW object
can be used to keep track of all objects that have participated in the scope of
the transaction. When the transaction
is to be committed, a Legacy Update Coordination Object (see below) can move
data from all of the participants back to the legacy system.
·
Triggered
by the commit of a UOW, by explicit call from a business client, or by a
scheduled batch job. Figures out which
objects in the BOM should be used to update the legacy systems. If a UOW object is used, this is simply the
list of participants. If no UOW is
used, will likely examine some status field and then select other objects based
on business requirements. Will
implement a pseudo two-phase commit with legacy as discussed under “Update
Client” above. May interact with a
Response Polling Client or a Response Event Client (see above).
·
Triggered
by an activity in the BOM (such as specific data changes) or by a scheduled
batch job. Performs the same function
as the “Synchronization Client” above.
The difference is that data exchange takes place at the lowest levels of
the server, possible using APIs that are specific to its function. Legacy snapshot data may also have been kept
so concurrent update can be dealt with.