Healthcare Data Mining, Structured Data and Natural Language Processing

Medicine and healthcare have been one of the most researched and studied branches of Science for centuries.  There are records of use of medicines as early as 500 B.C.  Research and development over millions of years has led to the establishment of the current structured healthcare system.

Documentation of patient records is an integral component of healthcare and mandatory in many countries which rely on insurance-based healthcare systems. Early forms of healthcare documentation involved physicians keeping hand-written records of patient visits and filing this information for future reference.  Managing records of thousands of patients in paper became impossible, not to mention that paper-based records were vulnerable to loss in natural calamities. This led to the birth of electronic healthcare data capture and documentation.  Patient records were then managed in the form of electronic documents and systems like EMRs, EHRs, and other forms of electronic healthcare data management systems provided secure patient information and easily available to the physicians whenever required.

Hospitals and healthcare practices across the US spend thousands of dollars every year in documenting and managing patient care details to meet statutory requirements of the healthcare industry.  Most of this data is recorded and stored in EMRs and EHRs and used generally for insurance purposes or for reference.

An innovative and visionary line of thought is the use of concrete data and evidence to support medical decisions.  This is called EBM or evidence-based medicine.  Evidence of this is available from as early as 1854 when John Snow (considered the father of epidemiology) used maps with bar graphs to discover the source of a cholera outbreak and trace it to the water supply system in London.  He counted the number of deaths and plotted the victims’ addresses on a map and saw that all the deaths occurred around a common water body.  This was one of the earliest applications of data mining.

The modern EMR of a hospital or healthcare facility is a rich treasure-house of information of thousands of patients with a wide facet of illnesses, containing thousands of medicines, history etc.  Each and every bit of information stored in this system could be a part of a pattern of events which if studied could give valuable insights into the pattern of diseases and the techniques of treatment and if researched lead to predictions about disease outbreaks.

The question however is how do we tap into this vast pool of data and extract the information we need!!! This could be available either by:

  1. Manually searching through thousands of documents.
  2. Creating an electronic tool to search for data and analyze patterns.

Manual searching of such huge volumes of data is not a practical solution.  An electronic tool to do that would have to be an intelligent system which should know exactly what to search for, where to search it, and how to present it in the most useful way.  Different physicians have different styles of dictation and formats of reports, the search tool will have to separate out the required information and present the most valuable information.

For example:

Heart disease is one of the most common causes of death in the United States.

Identification of early signs of heart disease can save thousands of lives.  Analyzing a database of thousands of patients with heart disease can give valuable information about the probable causes, nature of progression, etc., of heart disease and help in developing systems that could identify heart disease at the earliest signs of occurrence leading to timely treatment and preventive techniques can save many lives.

Natural Language Processing or NLP is a field of computer science and linguistics concerned with the interactions between computers and human (natural) languages.  It began as a branch of artificial intelligence.  In theory, natural language processing is a very attractive method of human–computer interaction.  Natural language understanding is sometimes referred to as an AI-complete problem because it seems to require extensive knowledge about the outside world and the ability to manipulate it.

Combining NLP and data mining provides the solution to tap into the huge resource of health-care data and provide tangible solutions to queries and problems.

EZDI is a clinical Natural Language Processing Engine that identifies and converts relevant text into codes and numbers using patented technology.

EZDI combines data mining and NLP to extract clinical information from an EMR, or any healthcare documentation system, and provides structured information on diseases, findings, procedures, microorganisms, pharmaceuticals, etc., arranged systematically with computer processable collection of medical terminology SNOMED-CT (Systematized Nomenclature of Medicine – Clinical Terms).

Key Areas of Application Include:

  • Improving the Quality of Patient Care
    • Identifying high-risk patient groups with combinations of symptoms and/or risks.
    • Identifying the need for prophylactic measures to prevent outbreak of disease.
    • Improve patient care through efficient prescribing of drugs by identifying duplication or over-prescribing of drugs, and also identifying potential drug interactions in contraindicated drugs
    • Search for statistical data regarding patient-disease patterns, classifying them based on age, gender, geographical locations, food groups, etc., by identifying common factors among patients with similar diseases. Identifying the need for diagnostic tests in specific patients, leading to effective dispensing of health care measures.
  • Ensure Compliance of Health Care Documentation
    ezDI’s search engine makes auditing and reporting of “medical records compliance” an automated process.
  • Revenue Generation and Saving
    Lowering the cost and effort involved in clinical Research and Development through automated chart review.

Identifying the need for specific diagnostic tests in specific patients, leading to effective dispensing of health care measures and eliminating unnecessary tests.

ezDI is the perfect tool for evidence-based medicine and treatment and is the future of healthcare in general.  With accuracy up to 98% and immediate availability of query results, ezDI is the future of clinical data analytics this product will ensure more effective and efficient healthcare delivery.



By designing a next-generation clinical NLP engine supporting advanced documentation and coding functions, EZDI turned their vision to reality. Their CAC and CDI solutions received tremendous feedback, providing system accuracy and ease of use. EZDI removes the data complexity and highlights what matters for healthcare professionals.

EZDI is a provider of AI-based mid-revenue cycle management solutions to Hospitals and Health Systems.

Subscribe to our Newsletter