2006-01-31 00:00:00

De-identifiying protected health information (PHI)

A number of clients have been asking me about protected health information (PHI) solutions so I thought I’d put out a general call for help from my esteemed readers. What I’m looking for is a general-purpose data de-identification library (preferably open source) that I could use in both OSS and commercial systems. Even if it costs money, I’d love to hear about it.

The idea is to be able to find PHI automatically in any arbitrary data packet (HL7, e-mail, database, etc), be able to flag it, do a one-way hash, tokenize it, add it to a dictionary, etc. There are many uses for this kind of software from automatically scanning outgoing emails to protecting sensitive data within databases so data can be aggregated and shared.

A thought leader in the de-identification and data privacy space is Dr. Latanya Sweeney who teaches at CMU. She has some really nice stuff, though I don’t think it’s open source or available online.

A commercial firm that does this stuff is called De-ID. NIH’s National Cancer Institute uses them for their projects, too, so it’s probably pretty good.

Please drop me some comments here if you know of other researchers that might have some stuff available that they can share. While finished products would be great, even research projects are welcome.

Filed under: — @ 2006-01-31 00:00:00
2006-01-31 00:00:00

How to manage tape backups in health IT shops

These days we’re hearing more and more about how backup tapes, which are crucial for business continuity and disaster recovery purposes, are getting lost due to carelessness or transfer problems. Many of you are CXOs in charge of technology and information systems and I’m hoping you have had a chance to review your off site backup tape storage policies. Off site storage is crucial if you use backup tapes so that a fire or other disaster doesn’t take along your backups with your primary systems. However, about 30% of you still don’t put your backup tapes in off site storage. This was true of a number of hospitals and care providers in Louisiana and Mississippi and they lost significant data. There’s no getting it back.

If you are going to backup your systems to tape, you need to have a mechanism to track them. Backup tapes are a favorite means by which company insiders take off with information and thieves will also try to steal data through backup tapes.

Most stories that reveal data theft through backup tape transfers forget to mention something important: thieves take backup tapes because they can easily get data off them. Most db admins and system admins don’t do simple stuff like encrypt their data on backup tapes. If you’re responsible for your IT department please be sure to check that all data on backup tapes are encrypted. It’s easy to do and in case your backup tapes are ever stolen it won’t give the thieves an easy mark. Oh, and don’t use simple symmetric encryption — make it a bit harder by apply dual key encryption or something more difficult to crack.

The moment a backup tape is created, it needs to be uniquely identified and labeled. Try using a barcode if possible. At the time the tape is created, the ID of the tape should be entered into a physical and electronic log. The physical log should be just a log book with signatures and the electronic log can be a simple spreadsheet that can be searched later. A physical and electronic log should be kept in your primary and secondary locations. There should be columns in the log that track who created the tape, what’s on it, and who it was handed to (courier service, etc). If a courier service picks up the tapes (not a good idea unless they are specialists) they need to sign off appropriately and ensure that the courier is known by name and logged in the log book and spreadsheet.

After identification and labeling, but before transfer, the backup tape should be put under lock and key in a lock box. If you are not using a courier service (a good idea), one of your staff should be made responsible for getting the transfer done and ensuring logs on both side (source and destination) are filled out properly. An executive (or manager) should review the logs regularly to make sure there are no discrepancies.

None of these techniques I’ve discussed is inexpensive or trivial to implement. However, if you’ve got a solid process in place you can keep the name of your organization out of the newspapers. Something your boards and CEO might appreciate :-) .

By the way, if you’ve got a little bit more cash you can eliminate backup tapes altogether and create an active disaster site with full electronic backups in a secondary location. This way, there are no issues with transfers, tapes, etc. For example, I have clients that have a primary data center and then a secondary data center where we store snapshots (backups) on hard drives instead of tapes. Then, the snapshots are placed onto optical media and stored in the secondary location in a vault.

Filed under: — @ 2006-01-31 00:00:00
2006-01-31 00:00:00

EJK’s no-contact thermometer

Engadget reported about EJK’s no-contact thermometer this morning. Pretty slick.

Contact-less thermometer

Filed under: — @ 2006-01-31 00:00:00
2006-01-31 00:00:00

De-identifiying protected health information (PHI)

A number of clients have been asking me about protected health information (PHI) solutions so I thought I’d put out a general call for help from my esteemed readers. What I’m looking for is a general-purpose data de-identification library (preferably open source) that I could use in both OSS and commercial systems. Even if it costs money, I’d love to hear about it.

The idea is to be able to find PHI automatically in any arbitrary data packet (HL7, e-mail, database, etc), be able to flag it, do a one-way hash, tokenize it, add it to a dictionary, etc. There are many uses for this kind of software from automatically scanning outgoing emails to protecting sensitive data within databases so data can be aggregated and shared.

A thought leader in the de-identification and data privacy space is Dr. Latanya Sweeney who teaches at CMU. She has some really nice stuff, though I don’t think it’s open source or available online.

A commercial firm that does this stuff is called De-ID. NIH’s National Cancer Institute uses them for their projects, too, so it’s probably pretty good.

Please drop me some comments here if you know of other researchers that might have some stuff available that they can share. While finished products would be great, even research projects are welcome.

Filed under: — @ 2006-01-31 00:00:00
2006-01-31 00:00:00

How to manage tape backups in health IT shops

These days we’re hearing more and more about how backup tapes, which are crucial for business continuity and disaster recovery purposes, are getting lost due to carelessness or transfer problems. Many of you are CXOs in charge of technology and information systems and I’m hoping you have had a chance to review your off site backup tape storage policies. Off site storage is crucial if you use backup tapes so that a fire or other disaster doesn’t take along your backups with your primary systems. However, about 30% of you still don’t put your backup tapes in off site storage. This was true of a number of hospitals and care providers in Louisiana and Mississippi and they lost significant data. There’s no getting it back.

If you are going to backup your systems to tape, you need to have a mechanism to track them. Backup tapes are a favorite means by which company insiders take off with information and thieves will also try to steal data through backup tapes.

Most stories that reveal data theft through backup tape transfers forget to mention something important: thieves take backup tapes because they can easily get data off them. Most db admins and system admins don’t do simple stuff like encrypt their data on backup tapes. If you’re responsible for your IT department please be sure to check that all data on backup tapes are encrypted. It’s easy to do and in case your backup tapes are ever stolen it won’t give the thieves an easy mark. Oh, and don’t use simple symmetric encryption — make it a bit harder by apply dual key encryption or something more difficult to crack.

The moment a backup tape is created, it needs to be uniquely identified and labeled. Try using a barcode if possible. At the time the tape is created, the ID of the tape should be entered into a physical and electronic log. The physical log should be just a log book with signatures and the electronic log can be a simple spreadsheet that can be searched later. A physical and electronic logshould be kept in your primary and secondary locations. There should be columns in the log that track who created the tape, what’s on it, and who it was handed to (courier service, etc). If a courier service picks up the tapes (not a good idea unless they are specialists) they need to sign off appropriately and ensure that the courier is known by name and logged in the log book and spreadsheet.

After identification and labeling, but before transfer, the backup tape should be put under lock and key in a lock box. If you are not using a courier service (a good idea), one of your staff should be made responsible for getting the transfer done and ensuring logs on both side (source and destination) are filled out properly. An executive (or manager) should review the logs regularly to make sure there are no discrepancies.

None of these techniques I’ve discussed is inexpensive or trivial to implement. However, if you’ve got a solid process in place you can keep the name of your organization out of the newspapers. Something your boards and CEO might appreciate :-) .

By the way, if you’ve got a little bit more cash you can eliminate backup tapes altogether and create an active disaster site with full electronic backups in a secondary location. This way, there are no issues with transfers, tapes, etc. For example, I have clients that have a primary data center and then a secondary data center where we store snapshots (backups) on hard drives instead of tapes. Then, the snapshots are placed onto optical media and stored in the secondary location in a vault.

Filed under: — @ 2006-01-31 00:00:00
2006-01-31 00:00:00

EJK’s no-contact thermometer

Engadget reported about EJK’s no-contact thermometer this morning. Pretty slick.

Contact-less thermometer

Filed under: — @ 2006-01-31 00:00:00
2006-01-29 00:00:00

Light HL7 Library for Java

Mike, who works at Cleveland Clinic, has released the Light HL7 Library for Java. Here’s how he describes it:

The Light HL7 Library let???s you simply parse, modify and create HL7-like messages in Java. This is the same parsing library used by the HL7 Browser and HL7 Comm, and internally at CCF, so it has literally parsed millions and millions of records.

Filed under: — @ 2006-01-29 00:00:00
2006-01-29 00:00:00

Math Will Rock Your World

BusinessWeek has a nice article in last week’s issue: Math Will Rock Your World.

Most health IT applications, e-Health services, etc are usually nothing more than glorified data entry systems. Lets take some of the advice of the entrepreneurs cited in the article and move towards better analysis of the information we’ve been gathering in our systems for decades. Math and data mining could actually bring real value to our health IT apps.

Filed under: — @ 2006-01-29 00:00:00
2006-01-29 00:00:00

Light HL7 Library for Java

Mike, who works at Cleveland Clinic, has released the Light HL7 Library for Java. Here’s how he describes it:

The Light HL7 Library let???s you simply parse, modify and create HL7-like messages in Java. This is the same parsing library used by the HL7 Browser and HL7 Comm, and internally at CCF, so it has literally parsed millions and millions of records.

Filed under: — @ 2006-01-29 00:00:00
2006-01-29 00:00:00

Math Will Rock Your World

BusinessWeek has a nice article in last week’s issue: Math Will Rock Your World.

Most health IT applications, e-Health services, etc are usually nothing more than glorified data entry systems. Lets take some of the advice of the entrepreneurs cited in the article and move towards better analysis of the information we’ve been gathering in our systems for decades. Math and data mining could actually bring real value to our health IT apps.

Filed under: — @ 2006-01-29 00:00:00
« Previous Page