Medical Knowledge Mining

“It’s easy to make perfect decisions with perfect information. Medicine asks you to make perfect decisions with imperfect information.” ― Siddhartha Mukherjee, The Laws of Medicine

Cyberneticcare's Population Health Solutions is a science driven company. Cyberneticcare's Population Health Solutions and Precision Genomics help hospitals and doctors make perfect decision with imperfect information. At Cyberneticcare's Population Health Solutions and Precision genomics, we address the complex challenges of medicine, systems biology, genomics, and human diseases with their interactions through quantitative mathematical models. We use AI (Artificial Intelligence) tools to combine EMR (Electronic Medical Records), Lab (Laboratory) data, Radiology data, Genomic data with Biological databases, and Research results to deliver AI (Actionable Insights) that will offer Better Health and Better Outcome at Better Price. We present here a set of hypothesis creating systems for biomedical knowledge from hospital data. EMR captures the encounter narrations between patient and the clinician or proceduralist without any in-built intelligence. Here we have added the intelligence to unleash the medical knowledge from such data. This knowledge will help reduce the disease burden, increase accuracy, and make healthcare affordable. From the clinical data we created Data-warehouses and Data-marts that were used for the analytics and knowledge discovery using various statistical/quantitative models.

Synthetic systems are created by humans and all attributes of these systems relate to some perceivable objects. All attributes of such synthetic systems are known and they are quantitative (numeric numbers). Whereas, human biology and the disease in human body are creation of nature with many unknown factors. Majority of the attributes of such systems are qualitative and do not have any meaning without the context. In statistical terms these variables are known as categorical (non-numeric textual) variables within unstructured data. Moreover, a biological system is a multi-scale system that ranges from nano-meters (10-9 meters) at an enzyme level to thousands of kilometers (107 meters) at the population level. Furthermore, biological systems are often multimodal, the same molecule works differently in different set-ups -- for example the MYC gene normally functions as tumor suppressor but a pathogenic mutation causes it to function as oncogene that helps proliferate cancer. An analytic system for biomedical systems needs to deal with such complex scenarios.

Gartner analyst Doug Laney introduced the concept of 3Vs in big-data in 2001 in the context of synthetic systems. Big-data in medicine or biomedicine to be precise is different from big-data as it is understood otherwise! In our experience big-data in biomedicine is best defined through 7Vs.

7V's in Biomedical big-data:

  1. Volume (physical volume or the size of the data)
  2. Velocity (speed at which an actionable request is serviced)
  3. Variety (heterogeneity of the data – multi modal – structured/unstructured)
  4. Veracity (security, confidentiality, and reliability)
  5. Vexing (algorithmic complexity to process large volume of data)
  6. Variability (scale of data – from nM (10-9) to kM (107); mainly categorical & non-numeric
  7. Value (actionable insight, context based, and functional knowledge)

A patient record entered into an EMR/EHR is the documentation of the interaction between doctor and the patient. It is ONE Dimentional data. One dimension data is sufficient to treat a symptom. Treatment of disease however needs an analysis of episode (multiple interactions with many doctors and diagnostics centers) – it needs minimum TWO Dimensional data. Treatment of patient having a disease needs minimum THREE Dimensional data that will include multiple diseases treated during multiple episodes on a temporal scale (including familial inherited diseases). Population Health in contrast needs MULTI Dimensional data that will includes millions of patients in the population with thousands of diseases over millions of episodes over billions of encounters. The spatial and temporal disease information with genomic knowledge need to be summarized and stored in machine understandable fashion. Health data is fragmented and distributed across many hospitals, clinics, laboratories, and diagnostic centers. All these data cleaned, normalized, combined, and then analyzed to extract the health of a population.

Healthcare systems like EMR (Electronic Medical Records) or EHR (Electronic Health Records) deployed in hospitals are data storage-retrieval system without any intelligence. Our analytics system sits on top of this data to mine the clinical records to extract medical knowledge that will be used at the point of care. This knowledge will help improve speed, testability, repeatibility, quality, accuracy, safety and cost of patient care. Cyberneticcare's Population Health Solutions and Precision Genomics Analytics platform for Medical Intelligence is an example of AI (Artificial Intelligence & Actionable Insight) for biomarker and knowledge discovery from clinical, medical and genomic multi-scale, dimension-snsitive data to transform reactive medicine driven care into proactive medicine driven cure; and make healthcare accurate, predictive, and affordable. All these big-data are normalized, integrated, complemented, supplemented into a knowledgebase through a big-data analytics system constructed by Precision Genomics. This knowledge will be used for Decision driven Evidence Based Precision Medicine and Evidence Based Precision Medicine driven Decision to achieve Triple-Aims.

We present here various administrative, functional, and biomedical knowledge discovered from the hospital data. These are,

  1. Administrative & Operational: Key Performace Indicator (KPI) driven Operational, Administrative, and Financial analysis
  2. Population: Visual representation of administrative and clinical knowledge at the population level from medical records
  3. In/Out Patients: Visual representation of administrative and clinical knowledge from IN and OUT Patient data
  4. Disease Networks: Disease interactions and comorbidity in (a) circulatory, (b) metabolic, (c) neoplastic, and (d) diseases in the population
  5. Disease Association: Association Rules analysis between comorbid diseases within a patient
  6. Seasonal & Temporal: Temporal (seasonal) Characteristics of diseases in the population
  7. Geo-Spatial: Biostatistics and Epidemiology of diseases influenced by key drivers like (a) socioeconomics, (b) environmental, (c) lifestyle and (d) genomics
  8. Lab-EMR Integration: Integration of Clinical records and Lab records for predictive medicine
  9. Disease Risks: Framingham 10-years Cardiovascular disease risk score
  10. Genomics: Analysis of exome data (discovery of pathogenic mutations and copy numbers) of (a) breast cancer and (b) lung cancer patients and
  11. Cancer Staging: Calculate the cancer stage from the TNM data, that includes top cancers like (a) Lip & Oral cancer, (b) Breast cancer, (c) Lung cancer, (d) Colorectal cancer, (e) Liver cancer, and (f) Stomach cancer

© Cybernetic Care Pvt Ltd, 2022