From the mind of Justin: February 2011

Wednesday, February 9, 2011

Translating Everest rc1 Details to Acknowledgement Error Codes

Everest has always included the IResultDetail interface for reporting errors back to developers that serialize or parse HL7v3 structures. More recently in Everest rc1, more implementations of this class have been added to make error reporting more granular. One of the things we can do with this data is create the transport acknowledgement details back to other systems so they can understand the errors as well.

First, let's take a look at the IResultDetail classes that come with Everest:

As you can see, the built in result detail classes all inherit from the ResultDetail. A quick review of each of the classes (from: the Tech Exchange Library)

Class	Description
MandatoryElementMissingResultDetail	Identifies that an element marked as mandatory is missing from the structure
VocabularyIssueResultDetail	Identifies that an issue has been detected with the vocabulary values specified in a structure
RequiredElementsMissingResultDetail	Identifies that an element marked as required is missing from the structure and no null flavor was specified
NotImplementedResultDetail	Identifies that an element is not recognized by the formatter and its data is lost.
InsufficientRepetitionsResultDetail	Identifies that an element does not meet the minimum number of repetitions
FixedValueMisMatchResultDetail	Identifies that the value of an element does not match its fixed value. The fixed value is taken in place of the original value.

So, how can we use these to create acknowledgement details? Well, simple, we just have to iterate through each result and create the matching code.

Let's write a simple program that reads in a file and creates a response. First, create a new console project with an appropriate CA message assembly and the R1 Formatter, and ITS1 Formatter.

Next, create we start to fill out Program.cs, let's create a function called ValidateMessage that reads a message and returns the validation errors:

static IResultDetail[] ValidateMessage(string file) {

First, we have to setup the formatter, so do that. I'm using an XML ITS1 formatter with DataTypes R1:

MARC.Everest.Formatters.XML.ITS1.Formatter fmtr = new MARC.Everest.Formatters.XML.ITS1.Formatter();
fmtr.GraphAides.Add(typeof(MARC.Everest.Formatters.XML.Datatypes.R1.Formatter));

Next, we'll open the file for read (standard .net stuff)

// Open the file
FileStream fs = null;
try
{
    fs = File.OpenRead(file);

Then we need to get the formatter to parse the object from the file, this will force a validation and will create a basic sanity check. Whenever you call ParseObject from a formatter, if the object returned is null, then the stream content was not HL7v3 content.

IGraphable obj = fmtr.ParseObject(fs);
if (obj == null)
    return new IResultDetail[] {
          new ResultDetail(
                 ResultDetailType.Error,
                 "Bad Message",
                 (string)null
           )
    };
Next, we return the validation or Details array of the object:

return fmtr.Details;

Finish up our function:

   }
   finally
   {

      if (fs != null) fs.Close();
    }
}And the ValidateMessage function is complete. The next function we'll create will turn the array of IResultDetail instances to HL7v3 acknowledgement details.

static IEnumerable<AcknowledgementDetail> CreateAcks(IResultDetail[] dtls)
{

The fact that we can create this function is due to Everest's new combining functionality whereby all AcknowledgementDetail classes are combined into one class type (instead of 10). To create the Acks, we need an IEnumerable structure that we can populate. We'll iterate over the dtls parameter and create a new AcknowledgementDetail for each IResultDetail we're passed:

List<AcknowledgementDetail> retVal = new List<AcknowledgementDetail>();
foreach(var dtl in dtls)
{
     AcknowledgementDetail ackDtl = new
     AcknowledgementDetail();

The first thing we need to do is populate the type code, since the IResultDetail class isn't used by the generated assembly, we need to do a manual translate:

switch(dtl.Type)
{
    case ResultDetailType.Error:
        ackDtl.TypeCode = AcknowledgementDetailType.Error;
        break;
    case ResultDetailType.Warning:
        ackDtl.TypeCode = AcknowledgementDetailType.Warning;
        break;
    case ResultDetailType.Information:
        ackDtl.TypeCode = AcknowledgementDetailType.Information;
        break;
}

Next, we want to translate the type of IResultDetail to a code, here are some examples of how I would do this:

if (dtl is InsufficientRepetionsResultDetail)
    ackDtl.Code = AcknowledgementDetailCode.InsufficientRepetitions;
else if (dtl is MandatoryElementMissingResultDetail)
    ackDtl.Code = AcknowledgementDetailCode.MandatoryElementWithNullValue;
else if (dtl is NotImplementedElementResultDetail)
    ackDtl.Code = AcknowledgementDetailCode.SyntaxError;

There would be an if-statement for each IResultDetail you'd want to translate. Next, I just fill in the location and text:

ackDtl.Text = dtl.Message;
ackDtl.Location = new
SET<ST>((ST)dtl.Location);

Then we add it to our return value and finish up the function:

     retVal.Add(ackDtl);
    }

return retVal;
}

Now all we have to do is fill out our Main function to load the message and create a response. I'm going to use MCCI_IN000002CA just to illustrate.

static void Main(string[] args)
{

MCCI_IN000002CA ret = new MCCI_IN000002CA(
    Guid.NewGuid(),
    DateTime.Now,
    ResponseMode.Immediate,
    MCCI_IN000002CA.GetInteractionId(),
    MCCI_IN000002CA.GetProfileId(),
    ProcessingID.Production,
    AcknowledgementCondition.Never
);ret.Acknowledgement = new Acknowledgement(
    AcknowledgementType.ApplicationAcknowledgementAccept,    new TargetMessage());
ret.Acknowledgement.AcknowledgementDetail.AddRange(
    CreateAcks(ValidateMessage("C:\\test.xml"))
);

// Format to the screen
MARC.Everest.Formatters.XML.ITS1.Formatter fmtr = new MARC.Everest.Formatters.XML.ITS1.Formatter();
fmtr.GraphAides.Add(typeof(MARC.Everest.Formatters.XML.Datatypes.R1.Formatter));
fmtr.ValidateConformance = false; // Just a sample, I know this message isn't correct
fmtr.GraphObject(Console.OpenStandardOutput(), ret);

}

And voila, you should get a proper AcknowledgementDetail message.

Monday, February 7, 2011

An (Obvious) query strategy for v3 Components

One of the difficulties of HL7v3 messaging is getting around the fact that it is a messaging standard, not a data standard. While a messaging standard is a good suggestion as to what a data model should look like, it is not a data model. Much of the information in an HL7v3 message carries information related to validation and acting upon data.

As part of my PostgreSHR work, I have analyzed the 12 sets of health data (identified via CHI standards) and devised (what I think) is a nice schema that facilitates the storage and retrieval of data in a somewhat normalized form:

I'll describe more about this as I continue my implementation of the PostgreSHR project. Anyways, I've devised a very simple query strategy for this type of schema that I think will work quite well (will keep you posted if this actually works).

Basically, within PostgreSHR, all incoming messages (from any format) are canonicalized into an object model that loosely maps to these tables. When querying the PostgreSHR data store, I plan on having prototypical components populated and passed to the query manager service.

To explain that, think of a message as conveying a health service event that is made up of components which participate in the whole. For example, a referral. A referral is a health service event that is comprised of a target of record, attestor, author, document, etc... components. When the persistence service writes messages to this schema, it uses a component persister to read the data from the components and put them into a relational form (ie: the RDBMS). When querying I hope to use a similar process, only in reverse (ie: instead of writing to the DB, the function will intersect result sets).

For example, say I persist a referral that is made up of these components:

Component	Relation to Container	Data
Client	TargetOf Referral	1: John Smith
HealthcareWorker	AuthorOf Referral	2: Dr. James D. Nephrologist
HealthcareWorker	AttestorOf Referral	2: Dr. James D. Nephrologist
Document	SubjectOf Referral	Blah blah blah blah...
ProvisionRequest	SubjectOf Referral	See an oncologist...

Later on, when someone queries the message the registry service creates a component data like this:

Component	Relation to Container	Data
Client	TargetOf Referral	1: John Smith
HealthcareWorker	AuthorOf Referral	2: Dr. James D. Nephrologist

The component query classes will use the data to construct a series of selects and intersects. So the algorithm is as follows:

FOREACH COMPONENT IN QUERY CONTAINER
    FIND THE COMPONENT QUERY CREATOR
    EXECUTE THE QUERY CREATOR
    INTERSECT RESULTS
END FOREACH

Through each iteration, the following query is built:

I1 :
SELECT * FROM FIND_HSR_BY_CLNT_PTCPTN(1, 256)

I2:
SELECT * FROM FIND_HSR_BY_CLNT_PTCPTN(1,256)
INTERSECT ALL
SELECT * FROM FIND_HSR_BY_PTCPT_PTCPTN(1, 32678)

By executing the query in this manner, each intersection with a subsequent result set is smaller and smaller. So long as PostgreSHR is indexed correctly ( and methinks it is) I should see significant performance from at least the document registration components (the component de-persisting is something to be desired but that is for a later post).

Anyways, its late and I'm going crazy from thinking of how to optimize this. I guess this will be my task over the next few days.

What's On This Blog

Hello kiddies... looks like your friend the mad programmer is just ... (Sorry was just thinking of "Tales from the Crypt"). Anyways, I have something of note to post however I realized that I didn't set any ground rules (for myself) as to what is going to be posted here. So, what are you going to find on the Mad Programmer's Blog (other than madness)? Here are my selected topics for this blog (for the time being at least):

PostgreSQL Discoveries, and Techniques that I discover
.NET Framework (Mostly C#)
BizTalk Server stuff
HL7v3 stuff that I discover

I'll be back tomorrow with a real post.

Saturday, February 5, 2011

PostgreSQL 9.0 ... "Feature" for bytea Type

Several of the functions in the PostgreSHR project that I'm currently working on need to save large objects into tables. These include the message log (ie: what messages has this service already processed) for query continuation, the contents of clinical documents, etc... etc...
Anyways, a couple of days ago I upgraded from Postgres 8.4 to 9.0. Everything seemed to work fine, until I started analyzing the log files of the message persistence service components. The following error appeared:

Data at the root level is invalid (1,1)

On a formatter within the Everest Framework. After searching the database, I found that messages were being persisted into the message table properly, however whenever I read back the data NPGSQL would return a corrupted byte array. After some digging, I found that all calls to bytea columns from NPGSQL returned corrupted data.
Apparently this is due to a change in PostgreSQL 9.0 that affects the encoding of bytea columns. So, to anyone struggling with seemingly nonsensical data loss from PGSQL 9.0 tables, here is the fix:

ALTER DATABASE XXXXXXDB SET bytea_output='ESCAPE';
After applying this to the database all functionality was restored!

Friday, February 4, 2011

Filtering Normalized Data from a PostgreSQL Table

One of the more difficult challenges that I've come up against in the past day of coding is filtering normalized data structures (namely "Names" from an HL7v3 message). Here is the problem.

What's in a name?

Apparently quite a bit... Most systems are happy with First, Last, maybe middle name. However in HL7v3 (and other systems I imagine) flexibility is the word. So, if you're named John Jacob Jingleheimer-Schmidt HL7v3 messaging can portray that. However any receiving system also needs to be able to store this data and query it. So if you register a record as John Jacob Jingleheimer-Schmidt and someone searches for John Jingleheimer-Schmidtor
John Schmidt that record needs to be returned.

Enter HL7v3 EN names:
<name>
    <prefix>Mr.</prefix>
    <given>John</given>
    <given>Jacob</given>
    <family>Jingleheimer</family>
    <delimiter>-</delimiter>
    <family>Schmidt</family>
</name>

Yes, they look great, and they can in fact capture all elements of a name that could possibly exist, however in a relational database they're a nightmare, especially for querying.

Storing the ENStoring the EN is easy enough. You just have to keep in mind that a name is really a container for name parts. With that in mind, we can create complex names through correlating individual tuples in a database. In the PostgreSHR project I'm working on right now, we store them in a table that looks a lot like this:

Name Part ID	Name Part Value	Name Part Type	Correlation Id
1	Mr	PFX	1
2	John	GIV	1
3	Jacob	GIV	1
4	Jingleheimer	FAM	1
5	-	DEL	1
6	Schmidt	FAM	1

The correlation ID can be used when de-serializing the name components back into a name container. This particular relational table works well for storingthe
name data, however (as you can imagine) it is quite difficult to query with SQL.

The concept behind the queryRecently I've come up with a nifty solution for querying this data. It's actually quite simple and works really well (tried it on a 10,000 tuple set). Basically, you iterate over the parts of the name you're looking for (your needle) and intersect each result set with the previous. So, let's say we have a database with four people: John Smith, Sally Smith, Billy Jim Smith and Sarah Sally Smith. Our name component table will look like this:

Name Part ID	Name Part Value	Name Part Type	Correlation Id
1	John	GIV	1
2	Smith	FAM	1
3	Sally	GIV	2
4	Smith	FAM	2
5	Billy	GIV	3
6	Jim	GIV	3
7	Smith	FAM	3
8	Sarah	GIV	4
9	Sally	GIV	4
10	Smith	FAM	4

And let's pretend we want to query for Smith, Sarah. We know that our result should be Sarah Sally Smith as that is the most appropriate match. We can use this algorithm to find all the parts:

SELECT correlation FROM name table INTO tResult1 WHERE name part = filters[0].name part AND name value = filters[0].name value;FOR EACH filter IN filters
SELECT correlation FROM name table INTO tResult2 WHERE name part = filter.name part AND name value = filter.name value
INTERSECT ALL
SELECT * FROM tResult1;
DELETE FROM tResult1;
INSERT INTO tResult1 SELECT * FROM tResult2;
DELETE FROM tResult2;
END FOR EACH
RETURN tResult1;

Let's trace through each iteration:

Iteration	Filters	Filter	tResult1	tResult2
0	{ FAM:Smith, GIV:Sarah }		{ 1, 2, 3, 4 }
1		FAM:Smith		{ 1, 2, 3, 4 }
2		GIV: Sarah	{ 1, 2, 3, 4 }	{ 4 }
3			{ 4 }

The Actual Code

The algorithm is easy enough, but what about the PL/PGSQL code for PostgreSQL. Well, after thinking for quite a while, I came up with this:

CREATE OR REPLACE FUNCTION FIND_NAME_SET(    NAME_CMP_VAL_IN    IN VARCHAR[],
    NAME_CMP_CLS_IN IN VARCHAR[]
) RETURNS TABLE (NAME_SET_ID DECIMAL(20,0)) AS
$$
DECLARE
    CNCMP    DECIMAL;
BEGIN

    -- CREATE TEMP TBLS
    IF((SELECT COUNT(*) FROM information_schema.tables WHERE table_name = 'tmp_name_res_tbl') = 1) THEN
        DROP TABLE TMP_NAME_RES_TBL;
        DROP TABLE TMP_NAME_RET_TBL;
    END IF;

    CREATE LOCAL TEMPORARY TABLE TMP_NAME_RES_TBL (
        NAME_SET_ID    DECIMAL(20,0)
    ) ON COMMIT DROP;
    CREATE LOCAL TEMPORARY TABLE TMP_NAME_RET_TBL (
        NAME_SET_ID    DECIMAL(20,0)
    ) ON COMMIT DROP;

    -- FILTER THE FIRST SET
    INSERT INTO TMP_NAME_RET_TBL
        SELECT DISTINCT A.NAME_SET_ID
        FROM NAME_CMP_TBL A
        WHERE
            NAME_CMP_VALUE = NAME_CMP_VAL_IN[1]
        AND
            NAME_CMP_CLS = NAME_CMP_CLS_IN[1];

    -- SUBSEQUENT
    IF(ARRAY_LOWER(NAME_CMP_VAL_IN, 1) < ARRAY_UPPER(NAME_CMP_VAL_IN, 1)) THEN
        FOR CNCMP IN ARRAY_LOWER(NAME_CMP_VAL_IN, 1) + 1 .. ARRAY_UPPER(NAME_CMP_VAL_IN, 1) LOOP
            INSERT INTO TMP_NAME_RES_TBL
                SELECT DISTINCT A.NAME_SET_ID
                FROM NAME_CMP_TBL A
                WHERE
                    NAME_CMP_VALUE = NAME_CMP_VAL_IN[CNCMP]
                AND
                    NAME_CMP_CLS = NAME_CMP_CLS_IN[CNCMP]
            INTERSECT ALL
                SELECT A.NAME_SET_ID FROM TMP_NAME_RET_TBL A;

            -- ADD RESULTS
            DELETE FROM TMP_NAME_RET_TBL;
            INSERT INTO TMP_NAME_RET_TBL
                SELECT *
                FROM TMP_NAME_RES_TBL;
            DELETE FROM TMP_NAME_RES_TBL;
        END LOOP;
    END IF;

    RETURN QUERY SELECT * FROM TMP_NAME_RET_TBL;
END;
$$ LANGUAGE plpgsql;

Notice that I create the TEMPORARY tables TMP_NAME_RES_TBL and TMP_NAME_RET_TBL as my tResult1 and tResult2 tables. Also, I'm using PostgreSQL's Array structures to facilitate passing in the name components and name classifications. Not exactly the best solution but there is some method to this madness, namely that it is easier to call this function via:

SELECT * FROM FIND_NAME_SET(ARRAY['Smith','Sally'], ARRAY['FAM','GIV']);

Hope that helps anyone who has to query normalized data using an de-normalized set of criteria!