Plan VI Final Write-up for Korean Phase

Sam Habiel, Pharm.D./ OSEHRA/ 14 May 2019

This is a post that will summarize all of the changes that we made to VistA in order for VistA to work with Korean. We will also describe some items that we couldn't solve in our timeline. The table of contents should give you a good idea of all the items in the list. At the end of the document, you can find links to all the KIDS builds, docker images, and the Github repos hosting all the changes made to the VistA source code.

Overall Goal of the Project

I personally know how big VistA is; so we tried to focus the project on items that will have wide applicability. That's why we focused on making CPRS usable with other languages; and why we didn't spend as much time in Roll and Scroll applications to do the same.

Topics

VistA must run in UTF-8 Mode

We used GT.M and YottaDB for the database. We needed to configure VistA to run in UTF-8 mode. To do that, you need to do the following:

The rest of the process is the normal process for creating a VistA instance.

CPRS Read/Write to VistA in UTF-8

Briefly, what we had to do in the Broker is to rewrite wsockc.pas. This was actually rather difficult, as it required good Object Pascal knowledge, which wasn't my forte. Since all strings were in UTF-16 in Object Pascal, we had to convert them to UTF-8 prior to being sent to VistA; and when we received from VistA, we converted from UTF-8 to UTF-16 encoding. On the VistA side, the changes were actually more limited: we only needed to change $L and $E to $ZL and $ZE. There were some other minor changes that had to do with ensuring that the broker know it had to operate in "M" mode rather than in UTF-8 mode, as the strings coming from the broker were not completely in UTF-8 yet (e.g. they may be broken over multiple lines).

There were two other Delphi side changes

With these changes, CPRS and any other Delphi Applications that include our changes will be able to read and write UTF-8 to VistA.

This is described in far more detail here, or dupliciated here.

Reports and TIU Unicode Support

Reports displayed non-ASCII data saved into VistA as ???. This was easily fixed; it was simply a Delphi procedure that needed to take an extra parameter.

TIU notes pasting and saving text was "cleaned up" with filters that removed non-ASCII characters. This was done to prevent saving non-ASCII data to VistA; since VistA now can take everything under the sun, we simply removed the filter.

The last problem was that were was M side code that checked for the presence of ASCII as a proxy for the note not being empty. That means if you wrote a note with non-ASCII text with no spaces, it was considered "empty" and was deleted. That was easy to fix.

This is described in more detail here, or duplicated here.

CPRS Localization Strategy and Framework

At this point, we can send and receive data fully in UTF-8; but all the labels in the program are still in English. Our first "port of call" was to check the official translation framework provided by Delphi: The Integrated Translation Environment (ITE). It had a large amount of problems--we actually couldn't use it even if we wanted to due to these issues. In the end we looked at multiple open source projects, and we chose "Kryvich's Localizer" because it's interoperable with ITE; it required very limited source code changes to get it working; and it had a very simple output format. We translated the CPRS strings into Korean and Chinese.

This is described in more detail here and here; or alternately duplicated here and here.

CPRS Problem List Editing Issue

Delphi and VistA use ASCII 255 as a delimiter when editing a problem in the problem list. ASCII 255 is not valid UTF-8, and would crash VistA. We replaced ASCII 255 with a pipe ('|').

CPRS Date Formatting Issues

This was actually somewhat difficult, as there is no consistency in CPRS date formats. What we observed was that CPRS tended to follow VistA conventions for date formats rather than Windows conventions; and there was significant variability on whether CPRS includes time with seconds or not, plus a significant amount of time where human readable dates were sent to VistA as is for validation rather than sending a Fileman date. The Problem List and Vitals packages were two big culprits in that regard.

The VistA side framework for representing dates was easy to fix. On the CPRS side, to retain backwards compatibility, dates used the original formats for the US locale, and the Windows short date format for the other locales.

Due to the fact that CPRS sends human readable dates to VistA, VistA has to be able to parse these dates back. That essentially means that we have to reproduce the formatting of the Windows short date format in VistA.

There is a lot of VistA code which did its own date formatting. Each of these places needed to be fixed individually.

This is described in more detail with lots of screenshots here and duplicated here.

Ability to enter CJK names

VistA as is could only accommodate ASCII names. One big problem in trying to enter Chinese, Japanese, and Korean (CJK) names was that VistA code checked that the name was uppercase; and if not, tried to uppercase it. Case in language does not exist outside of European languages written in Roman, Cyrillic, or Greek scripts. The first change to allow CJK languages was to only check case if the language supports case.

With the above change, we can enter longer CJK names (as in 2 characters or longer). However, many CJK names are only a single character long. So the other change that was needed was to allow a family name or a given name that is one character long.

These were all the changes we made. These changes do not address the following issues:

More details (including the specific changed routines) are here.

Fileman Data Localization

Fileman has mechanisms to localize date/time displays, number formatting, all the user dialogs that Fileman is responsible for, and the data dictionary. One thing it does not allow to localize though is data. Normally, you would assume that files are "empty" and then you would fill them with entries in your own language. However, many files have reference data that needs to be translated. For example, CPRS Cover Sheet & CPRS Reports have definitions that include their header names stored in file OE/RR Report (#101.24). Ultimately, this turned out to be a hard problem to solve. We came up with a single solution: Non-indexed data can be localized using the dialog framework. There is no way to localize indexed data (data that needs to be searched) without extensive changes to Fileman which this project was not prepared to undertake. The only solution is to replace the indexed data with the other language's strings. This is problematic as this won't work for more than one language at a time. We did this for the menu system, as described below.

More details on the full implementation of data localization are here.

HL7 Send/Receive Support

Actually, there were no changes that were needed due to internationalization. As a bonus, outside of this project, I wrote a how-to guide to get HL7 going on VistA:.

As alluded to in the section on Fileman Data Localization, the menu system consists of indexed data; actually not just indexed, but cached too. Theoretically speaking, the indexes and cache creation code can be modified to check a new data point that changes based on language; but in the end I decided this was too much work to do in the time given for the project. Instead, we translated the user visible text for menus by moving the entries into the dialog file. To switch languages, you have to copy the translations (or restore the original English) from the dialog file.

In addition to translating the data, the menu system driver in "XQ*" needed to be translated as well. An interesting issue came up translating some of the hardcoded strings in the XQ* routines, such as "Select {string} option". These were done in the code by concatenating the string. This does not work well for translation -- other languages will not have the same order of words as English. In these cases, concatenated strings need to be converted to phrases, which are internationalized all together as a unit.

We also needed to translate various Fileman pre-existing dialogs used in the menu system--mostly the ones involved in answering Yes/No questions. Later during this project I discover that translating these and making Fileman use them broke Patient Registration. There was a small bug in Fileman that needed to be fixed.

More details can be found in two slide sets: Part 1 and Part 2. The latter slides (which are the most comprehensive) can also be found on OSEHRA's website here.

CJK Wide Character Issues

This was not an obvious issue except in the Fileman Screen Editor (a word processing field editor); it took me a few weeks to exactly understand the issue. Chinese, Japanese, and Korean characters take up two (2) spaces in VT-100+ terminal emulated applications. Screen oriented VistA applications always assume a single character is one space wide.

This was a pervasive issue in multiple VistA applications, with no easy fix on the VistA side. The most common manifestation of this issue is seeing text that is supposed to be 80 characters long scroll off the screen and anything that is supposed to line up not line up (see the screenshot below):

Wide Character in CJK Causing Wrapping and misalignment
Wide Character in CJK Causing Wrapping and misalignment

In Screenman and the Screen-oriented editor, editing any text that contains CJK characters is difficult as the character positioning assumptions are wrong. Both programs need to be fixed, but there was not enough time in the schedule to do that. The screen-oriented editor can be easily replaced with a call out to a program such as "nano" which knows how to handle wide characters correctly.

Sample Application Translation

This part of the project was started; but since adding the Korean ICD-10 was added to the schedule and was expected to take a long time, doing a Sample Application Translation was never fully completed. One interesting output of the short time I spent is the routine UKOP6TRA. This routine moves the hardcoded strings out of a routine and into the dialog file. There is bonus code there as well to XINDEX the new routine--I did that as a way to check that the new routine I constructed is syntactically correct.

Lexicon Update

We had a request to put the Korean ICD-10 system into the Lexicon; this was not originally on the project schedule, but ultimately we decided to pursue this over doing a Sample Application Translation. After a false start (I assumed that the Korean ICD-10 will have the same codes as the US version; which was very far from being the case); I ended up replacing the US ICD-10 with the Korean version. In a perfect world, the coding systems will be universal and the same codes can be used across countries. Unfortunately, that is not the case.

Updating the Lexicon involved a significant learning curve.

More detail in this (as yet unfinished) blog post. In case that never gets done, here are the presentations I gave on the topic: First, Second.

DataLoader

The Dataloader is a C# application written for VistA for Education that let you import data in Excel spreadsheets. It relied on a little known dll that is part of the broker called "bapi32.dll". As you may remember from earlier in this document, we changed the broker Delphi source code to ensure that the broker worked with non-ASCII languages. We therefore needed to recompile bapi32.dll to ensure that it uses the new code. Finding the source code for bapi32.dll was difficult--but eventually we found it. After compilation it turned out that we needed to modify the C# interface to bapi32.dll as well to pass Unicode rather than ASCII strings (BapiHelper.cs in VistA.DataLoader.Broker project).

The VistA DataLoader source and install instructions can be found on Github.

The Plan VI presentation can be found here or duplicated here.

QEWD and Panorama

We did not plan to spend much time in the project on using Panorama, as it does not provide as of today any production ready VistA interfaces. However, we were pleasantly surprised that it just works out of the box.

Project Outputs

Documentation and Presentations

This blog post and everything linked from it, plus everything on the VistA Internationlization Project Group functions as good documentation on what was done. For ease of reference, I have all the important presentations/blog posts in chronological order on My Website.

KIDS Build

All the M code changes to VistA that will apply to any internationalization effort are packaged in a single KIDS build, which can be downloaded from here.

CPRS Executable & Vitals DLL

All the work on CPRS (and some work on the Vitals DLL to fix the date issues) can be found ready to run here.

Docker Images

We have two ready to use Docker Images: OSEHRA VistA 6, which contains limited demo data and is thus suitable as a starting point for a database to be used in production. VEHU VistA 6 is a database full of demo data.

There is no need to install the KIDS build into VistA as all the modified code and data are already integrated. The docker images also contain the Korean ICD-10 instead of the US version.

This link has a couple of screenshots for the OSEHRA VistA 6 docker image.

Github Repositories

There are two main Github repositories:

In addition, there is an plan-vi branch on the OSEHRA-Sandbox/VistA-VEHU-M which takes commits from the OSEHRA-Sandbox/VistA-M repo and applies them over a VEHU instance in order to apply the changes made in VistA-M to a VistA VEHU instance. This is done using git format-patch ... | git am.

Testing Script

Originally we were going to have Sikuli testing framework to test the changes that were made to CPRS. However, Sikuli looks at images not content; and having it analyze and correctly decide that (for example) a date string looks correct for a specific locale is not something that it can do. We haven't quite decided yet what is the best way to do automated checking. I came up with this manual check list of things to check to ensure that all of our code works once you set the Kernel Language to Korean:

Roll & Scroll

CPRS

DataLoader