Monday, February 7, 2011

An (Obvious) query strategy for v3 Components


One of the difficulties of HL7v3 messaging is getting around the fact that it is a messaging standard, not a data standard. While a messaging standard is a good suggestion as to what a data model should look like, it is not a data model. Much of the information in an HL7v3 message carries information related to validation and acting upon data.
As part of my PostgreSHR work, I have analyzed the 12 sets of health data (identified via CHI standards) and devised (what I think) is a nice schema that facilitates the storage and retrieval of data in a somewhat normalized form:

I'll describe more about this as I continue my implementation of the PostgreSHR project. Anyways, I've devised a very simple query strategy for this type of schema that I think will work quite well (will keep you posted if this actually works).
Basically, within PostgreSHR, all incoming messages (from any format) are canonicalized into an object model that loosely maps to these tables. When querying the PostgreSHR data store, I plan on having prototypical components populated and passed to the query manager service.
To explain that, think of a message as conveying a health service event that is made up of components which participate in the whole. For example, a referral. A referral is a health service event that is comprised of a target of record, attestor, author, document, etc... components. When the persistence service writes messages to this schema, it uses a component persister to read the data from the components and put them into a relational form (ie: the RDBMS). When querying I hope to use a similar process, only in reverse (ie: instead of writing to the DB, the function will intersect result sets).
For example, say I persist a referral that is made up of these components:

Component
Relation to Container
Data
Client
TargetOf Referral
1: John Smith
HealthcareWorker
AuthorOf Referral
2: Dr. James D. Nephrologist
HealthcareWorker
AttestorOf Referral
2: Dr. James D. Nephrologist
Document
SubjectOf Referral
Blah blah blah blah...
ProvisionRequest
SubjectOf Referral
See an oncologist...


Later on, when someone queries the message the registry service creates a component data like this:
ComponentRelation to ContainerData
ClientTargetOf Referral1: John Smith
HealthcareWorkerAuthorOf Referral2: Dr. James D. Nephrologist


The component query classes will use the data to construct a series of selects and intersects. So the algorithm is as follows:

FOREACH COMPONENT IN QUERY CONTAINER
    FIND THE COMPONENT QUERY CREATOR
    EXECUTE THE QUERY CREATOR
    INTERSECT RESULTS
END FOREACH


Through each iteration, the following query is built:

I1 :
SELECT * FROM FIND_HSR_BY_CLNT_PTCPTN(1, 256)

I2:
SELECT * FROM FIND_HSR_BY_CLNT_PTCPTN(1,256)
INTERSECT ALL
SELECT * FROM FIND_HSR_BY_PTCPT_PTCPTN(1, 32678)


By executing the query in this manner, each intersection with a subsequent result set is smaller and smaller. So long as PostgreSHR is indexed correctly ( and methinks it is) I should see significant performance from at least the document registration components (the component de-persisting is something to be desired but that is for a later post).

Anyways, its late and I'm going crazy from thinking of how to optimize this. I guess this will be my task over the next few days.

No comments:

Post a Comment