All Articles Entrepreneurship Non-Clinical Careers

Introduction to Health-IT Lingo

The field of Health-IT is rather interesting, but health-IT lingo is overwhelming. There are a lot of changes taking place, a lot of new terms popping up, and it’s hard for the patient-facing physician to keep up.

As a physician who does healthcare consulting, I wanted to break down some of the terms. These are worth becoming familiar with if you ever plan to dabble in Health-IT as a healthcare consultant.

Even if you’re not interested in anything but clinical medicine, Health-IT is making waves and might just transform our healthcare in positive ways. The desirable future jobs in healthcare might require you to know how to play well with engineers and the Health-IT teams.

I have thrown a ton of terminology into this post but I try to explain each with a real-world example. My personal interest is in artificial intelligence and data science, hence the slant towards that.

By no means do you need to master these topics; that’s what data scientists and biomedical engineers are there for. But the information you provide as a physician to a consulting client has to somehow be converted into health-IT lingo that a computer will understand. That’s what a lot of these topics are geared towards.

Optical Character Recognition (OCR)

OCR is what’s used to recognize your passport when you swipe it through a machine. It can also be used to scan a clinical document and convert that image into text.

There are subsets of OCR such as Intelligent Character Recognition (ICR) and Intelligent Word Recognition (IWR), which can help convert handwritten content into digital text.

If you’re a large medical group and you need a ton of old charts digitized, OCR is a must. The text becomes searchable and takes up less space, as opposed to just scanning each page and storing a bloated image file.

Natural Language Processing (NLP)

Natural language refers to the words we speak and the conversations we have as humans. NLP is where we try to extract meaning from spoken words. Speech recognition by Alexa and Siri are such examples.

The way text messaging replaced phone calls, voice is and will replace the way we perform searches and interact with our phones and computers.

Voice technology is fascinating. This study was able to find a link between voice characteristics and CAD. Imagine having your telemedicine software listening to the patient’s voice and recognizing wheezing or being able to predict a certain disease.

Imagine sitting in an exam room with a patient and having a conversation. The entire conversation isn’t recorded, but the NLP software captures and summarizes the entire visit and spits out a SOAP note. To go a step further, the software might one day also flag you because it noticed that you’re planning on performing a knee replacement on this patient; but the patient is suspected to have CHF or another conditional which makes them a terrible candidate for surgery.

Hidden Markov Models

This is a statistical model used for speech recognition – part of the NLP discussion above. It’s used for many other processes as well, but I mention it because many NLP’s are still based on this.

Statistics and math will come up a lot when dealing with healthcare information technology. Algebra, calculus, stats, and even geometry. But, again, you don’t have to know it, just be familiar with how the nerds codify your knowledge.

Language processing is undergoing some interesting changes. More and more designs are focusing on using deep learning instead of statistical prediction methods.

Deep Learning (DL)

Deep learning is part of machine learning; it’s the child of machine learning. As the words indicate, it’s a way to teach a computer how to learn. The idea is to imitate the human brain and use artificial neural networks (ANNs) to simulate learning.

The “architecture” of DL could be something like deep neural networks which are used for speech recognition and natural language processing (NLP). It’s even used for pharmaceutical drug design.

The term “deep” refers to the many deeper layers which are built into this type of machine learning. For example, deep learning that’s used to interpret an EKG would first identify the edges of the the paper, then the tracing, then determine if it’s a properly done EKG, then interpret the various measurements, and so on.

When it comes to DL, you’ll hear terms of supervised and unsupervised learning. Supervised is when an engineer sits down with the model and helps interpret the data. Unsupervised learning is the nectar of DL; most of the data that’s out there hasn’t been manipulated in any way and benefits from the deep learning tool sifting through them and learning from them.

Read: huge opportunities here for the clinician interested in fusing healthcare and IT.

Think of unsupervised learning as having to babysit the computer, versus letting it get better on its own. More on this in a bit.

Machine Learning (ML)

Machine learning is the child of artificial intelligence. Its an algorithm or process by which computer systems can complete tasks without specific instructions from humans.

Your email filter uses machine learning to figure out what to send to spam. It’s the kind of process which heavily relies on statistical math. Your input (clicking the “spam” button) is part of feedback which ML uses to get better.

The main purpose of ML is for it to make predictions. But it uses previously known facts which the engineer might program into the model. The distinction between ML and “data mining” is that in the latter the learning is more blind.

Other terms you might hear when we talk about ML and medicine are: game theory, reinforcement learning, Markov Decision Process (MDP), and Bayesian Networks.

One of the downsides to ML is that it requires a lot of data to train the model. We also need some synthetic data which are then vetted against each other, from which the ML model learns and is able to make more accurate predictions. As a physician I provide a lot of this synthetic data based on my experience as a clinician.

Disease prediction modeling, which is the work I do as a healthcare consultant, requires the expertise of a clinician to create the model and it requires actual patient data to train the model.

Game theory comes into play when we want the model to play a “game” in order to improve its prediction abilities. A great research article on this topic can be found here.

Vector Space Modeling

This is an algebra based mathematical model which is used to index a body of text in order to make it searchable.

This is important in healthcare because we tend to have a lot of text which needs to be searchable and retrievable quickly and efficiently. You wouldn’t want to reorder a test which a patient already had done; especially when that patient changes from one medical group to another.

The Vector Space Model is one type of information retrieval system. Another one, and a more popular one, is the Standard Boolean Model.

You can imagine how important Information Retrieval is to a search engine like Google. As well as to the doctor who is doing a search for a possible diagnosis or test in the patient’s massive EHR.

Naive Bayes Classifiers (NBC)

This is used in machine learning and is based on Bayes’ theorem. I tried to understand how this relates to the work I do, but failed. It has come up enough that I thought I would throw it in here.

You’ll hear it being referred to also as naive Bayes, or simple Bayes or independence Bayes.

Unified Medical Language System (UMLS)

The UMLS are all of the scientific lingo which we use in medicine. It’s maintained by the US National Library of Medicine and regularly updated. UMLS helps ease the access to such a massive library of biomedical terms.

We are talking about millions of terms, not thousands. Think of all of the terms within ICD coding and CPT, SNOMED, and DSM, etc.

Being familiar with this system is all that is needed. Most of you will already know the terminology. You likely won’t have to manipulate the database but you might have to work with an engineer whose job it is to make sense of the UMLS.


ICD-10 is the 10th version of the International Classification of Diseases (ICD) which is maintained by the WHO. And guess what, ICD-11 is right around the corner – 2022.

Knowing the terminology and knowing how to interact with this database can be beneficial to a physician interested in Health-IT.

To go a step further, each specific ICD-10 has a payment model associated with it. As you can imagine, a company who wants to go after the highest paying codes may want the expertise of a clinician who knows how CMS reimburses each of these codes.


This is a standardized terminology used to create “doctor speak”. So, all of those medical terms used in eliciting signs and symptoms and other medical jargon, falls under this.

Understanding the hierarchy and relationship is the toughest part about this for a physician. The terms will be quite familiar to you, otherwise.

CPT Code

Current Procedural Terminology (CPT) code is maintained and copyright protected by the AMA. This is how medical services and interventions are communicated between various entities in healthcare.

It’s very similar to ICD-10, with the main difference being that it focuses on the services rendered; think, money that can be made from an intervention.


Logical Observation Identifiers Names and Codes (LOINC) is used for the medical lab industry. It creates a 6-digit code to identify various clinical situations.

If you order a specific lab test or procedure and you want to be reimbursed for it, then you better have a coding mechanism in order to get paid.


This is part of UMLS terminology and contains all the FDA approved medications in the US. It is maintained by the National Library of Medicine.

If you want to communicate medications between different entities, you want to have a standardized way of doing so. HealthIT engineers will need to know how to work with such a database. But for the physician, it’s enough to just know that it exists.

HL7-FHIR Data Standards

Fast Healthcare Interoperability Resources (FHIR) is used for exchanging information between EHR’s. You’ve probably heard the term “interoperability”. And if you haven’t, you’ll hear a lot more about it in the future.

Patients move from one medical system to another. With so many different EHR’s and so much of the patient’s information dispersed, HL7’s FHIR, is touted to be a way of exchanging information through API’s.


This is a specific computer language. It’s a very popular language because it’s easy to understand, easy to learn, and it has an extensive standard library.

As a physician you don’t need to learn Python – I don’t see how that would benefit you. You just need to understand how it works. Watch a YouTube channel on it or take a Duolingo-style course on it and you’ll get the gist rather quickly.

Regular expressions (RegEx)

Think of this as a mini language to help the user perform a search. For example, when you press [Control+o] to perform a word search of a webpage, RegEx is what’s happening behind the scenes.

It’s also used for pattern matching techniques when dealing with databases: data validation, data scraping, and simple parsing.

Searching databases is a very important task in healthcare. If you want to do a research study or perform pattern matching in a huge database of patients, you’ll need to search that text and database for the right information.


Imagine that you want to do a study about maternal mortality. You purchase a massive database from Medicaid and now you need to sift through tens of millions of data points.

Elasticsearch is a company which makes this part easy for you. You upload your shit there and it allows you to manipulate your data in all sorts of ways.

It is based on the open source (free) indexing library called Apache Lucene, or Lucene, for short.

Imagine being able to upload thousands of gigabytes worth of recorded conversations between patient and doctor and then making that information searchable. That’s the purpose of such indexing companies.

Data Visualization (Dataviz)

I know that I’m talking a lot about data. But remember, we’re creating a shit-ton of data and not using it in any way. For example, as a surgeon I might perform a ton of surgeries and document the complications and successes. But it often takes nearly a decade for that information to make it into a research paper and for the medical establishment to change their procedures.

The purpose of data science is to translate any kind of data that’s created in healthcare and make it available for interpretation immediately. Why should a surgeon rely on their own 10,000 cases to learn from, when they can learn from 10 million cases in real-time.

All dataviz refers to is that you take data and transform it into something visual. It helps with communicating the gist of the data to a broader and less technical audience. It’s just one of the man steps in analyzing data.

Matplotlib, for example, is a library for Python which helps in data visualization.

I’ll also mention R which is a programming language used by statisticians to perform data analysis. There are ways of using R for a graphical interface. So, for example, you might find yourself on a video conference call where the data engineer is sharing a visual interpretation of data which you provided and they used R to create that visualization.

Amazon Web Services (AWS)

This is a very profitable arm of Amazon which provides cloud computing space to companies and individuals.

From networking to storing information, companies need cloud space and they pay good money for this. You will hear this term a lot, so I decided to include it in here. There are other cloud computing companies besides Amazon’s AWS.


GitHub is what programmers use to create their software. It’s a place where they store parts of their software and it’s accessible to other teams for modification.

The point of GitHub is collaboration between developers in order to make the project as successful as possible. One programmer can take something that’s already been developed and expand upon that or improve upon it.

The real-world example would be a text document which has multiple editors. Everyone adds their own comments and revisions to it and it’s all trackable. On GitHub, instead of a text document, it’s code that’s being edited and shared.


There are a lot of definitions for interoperability. The purpose of interoperability is to advance healthcare. If every new software or medical device company creates their own system which doesn’t play well with others, then healthcare and medicine will be kept in the dark ages.

Interoperability is an initiative to make data available seamlessly among all sorts of platforms. A classic example is a patient’s health data; being able to access it from one EMR to the next.

Ever tried reading a patient’s MRI by sticking a CD or DVD into your computer? Then clicking install. Then having that installation get blocked. Then having to click on the images individually? Not very interoperable.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.