Machine Learning – Thoughts on Medicine

The importance of providing Labels and Structure to your Healthcare Data towards AI Success

on March 24, 2019

The worn cliche: Data is the new oil!

Your organization may be big. It might be small. It might be a mature business, with an approved device pipeline, or, it might be an early stage one. Your organization might have any number of adjectives describing its nature, size and characteristics. The commonality, is that, if it is a healthcare organization (or any type of organization, though we are focused on healthcare here), it has data, that can be classified and categorized in several ways, and can indicate to a point in the immediate or distant future, where it could have significant impact and ROI depending on how it is analyzed and applied in alignment with your organizational goals.

Thus, whether the cliche seems tired to you or not, there is a lot of truth to the statement that your data can be useful in several ways in the future – it could represent inherent value for investors, it could help with forensics and root cause analysis, in new designs, in optimizing performance, manufacturing, service, competitiveness and possibly in many different ways you haven’t yet imagined! This then indicates you should have a sound data strategy.

Storage is Tactical, Not Strategic

If you think, “well, our strategy is to acquire the data, store it in the cloud and back up for when we are ready for AI”, you are confusing tact with strategy. Some might argue that the act of storing data is even more banal than being tact. There might be truth to that, but let us make sure that you do have sound plans to acquire and store data with back up plans that are better than what MySpace appears to have been employing (for the Rip Van Winkles out there, I will provide a link of the recent disaster that MySpace was engaged in – essentially losing approximately a decade’s worth of its users’ music)!

So, your strategy for data needs to be a lot more than strategy.

However, do not take storage lightly! In my career, I have seen organizations with weak leadership that engaged in such stupid ways (and this is putting it kindly) of collecting and archiving data, that they lost critical and crucial, historic information relating to verification and validation!!

So, while it might seem simple, make sure your organization has well tested and trusted methods of collecting and archiving data, backed up appropriately and retrieved securely, with reasonable ease. At a minimum, you should consult with Data Scientists, if your size and budget does not permit developing a Data Science team in-house. It might be appropriate for the Quality and/or Regulatory functions within a small organization to bear the responsibility for this. For device and drug companies, regulations already require much of this, and yet you would be surprised how easy it is for organizations to miss the mark on this.

Labeling

There are several definitions for data labeling and definition of structure, as it pertains to data. I will leave this as a basic exercise for you, the reader, but I am sure, you have a commonsense understanding of what the terms allude to, and so let us use that as a yardstick to get to the thrust of this blog, rather than reducing it to pedantic and semantic exercises. Data labeling alludes to the fact that you mark data in terms of body temperature, instrument temperature, body weight, age, sex, heart rate and on and on. For anyone who has done even the simple exercise of bench top testing, it will be obvious that you already have to take a lot of care in labeling data, or the losses can be quite catastrophic.

Labeling tends to be the most expensive and time consuming exercise in preparing your data for Machine Learning, as anyone who has attended a talk by me on this topic has been clearly warned before!

Labeling should be understood in broad terms that go beyond merely the titles of columns and rows on a spreadsheet. For instance, being able to pull several spreadsheets, and readily understand which data came from acute or chronic studies, bench-top or human trials, real vs. simulated data must all be considered. Confuse these, and your machine learning might be trained with the wrong data and the results can be quite undesirable.

Data Labeling should also be very clear at every level and this cannot be stressed enough! If you have data collected by 4 thermocouples, two years from now, when someone pulls up the data for instance, they should be able to identify exactly where the data came!

In effect, you should establish a discipline of rigorous labeling for data, from the macro levels, going into the individual levels, through your SOPs, Work Instructions, Forms and make this a part of organizational culture.

Structuring Data

Data can be classified into Structured Data and Unstructured Data. Structured Data is reflected by the type of data discussed above – data obtained manually, through machines automatically, by programming, through logs, etc. (There is a note about data logs I will make in a moment). Unstructured Data on the other hand, could be data coming from sources such as online customer reviews (if you think this is not for you, just search for reviews on IUDs or Heart Burn Medications on the internet and you are in for a surprise), physician or nurse feedback, complaint information, procedure videos and other sources. Depending on the device, drug, biologic, app or the combinations thereof, your unstructured data can take many forms, and you should develop a very deep understanding of these sources, and how to collect, and structure this data as much as possible.

While, we all assume that some day, in the near or far future, depending on the various predictions, Artificial General Intelligence will allow us to completely skip data labeling (Neural Networks can do this to an extent currently), and analyze data, providing insights. However, there are limitations with current Machine Learning and Deep Learning Approaches, and you are better off providing as much structure to your data as possible.

For instance, if you are capturing procedure videos, whether on models or on humans, you might benefit from using fiducial markers for instance. Such markers are common in imaging systems already. Your device could also have identification markers on it, that show position, location etc., making human and machine understanding fast and easy. This can be very helpful when dealing with explainability and machine training. This could essentially convert the status of your data from ‘unstructured‘ to ‘semi-structured‘ or, if you can further template and refine it, ‘structured‘. Of course, you might hit a point where you get diminishing returns, or, as is the case, the machine can handle semi-structured data well enough.

Identifying the sources of your unstructured data, and striving to provide structure to your data, early, and strategically, will ensure that Data Analysis and Machine Learning Projects will be less expensive and better facilitated.

A note on Machine Logging

I was once involved in a project where the capital equipment logged critical data such as currents, voltages and other data associated with procedures. Various regulations require this, and for Failure Analysis, Adverse Event Investigation, etc. this is generally very essential. The fundamental problem with this (and others I have seen) particular data logging was the lack of foresight and poor implementation. Starting from the very labeling of the log files, to the time stamps, to the data collection frequency and various other features, it was a complete mess. The data had to be pulled from the logs into a spreadsheet program, and a number of routines had to be run on it, to make it passably useful. I used to wonder how a diligent auditor would take to the cumbersome and questionable logging, if he or she were to stumble on it!

Worse, the project leader prided himself in thinking that he had created an innovation in translating the gibberish into a-little-less-gibberish through the scripts! That sort of paucity in planning and respect for data will not help you, if you are to succeed.

The Bigger Picture

I want to make it clear. This is not just exhortation to make machine logs readily readable. And yes, your software team MUST do this. The point here is awareness, education, discipline and diligence towards data. Your organization needs to strive towards all of these aspects – and two more, which follow in the next section. It is not just leadership, but it is your entire organization that needs to focus on this for success.

Patient Privacy and Data Security

Increasingly, ranging from Government Organizations to Data Silo Owners and even other third parties, several organizations are admitting to failures in securing data, exposing data belonging to stakeholders. As you can imagine, exposing your patients’ data, erodes trust and is unforgivable, if the root cause for this is carelessness and sloppy, outdated practices.

Even when patient privacy is not involved, losing data of any other kind, such as test data, designs, manufacturing data etc. can be catastrophic to your competitiveness, and can make or break your organization.

Efforts to secure your data and protect your patients can never cease, and there can never be complacency in any of these areas!

Conclusion

The larger conclusion I want you to draw from my post here is this: you need a Data Strategy. Within that strategy lie elements such as labeling, structuring, security and privacy, none of which can be ignored. Your organizational members need to be educated of the importance of your strategy and they must have a healthy respect to help formulate, improve and maintain the strategy and discipline it will take you to succeed.

References:

Tactic Vs. Strategy: https://en.wikipedia.org/wiki/Tactic_(method)
Image of Tags: https://www.pexels.com/photo/several-assorted-color-tags-697059/
Image of Book Stacks: https://www.pexels.com/photo/pile-of-books-on-gray-metal-rack-1853836/
Image of Escalators: https://www.pexels.com/photo/building-escalator-1769356/
Image of Labeled Beans: https://www.pexels.com/photo/assorted-beans-in-white-sacks-1024005/
Image of Laptop: https://www.pexels.com/photo/low-angle-view-of-human-representation-of-grass-296085/
Data Loss at MySpace (An NY Times Subscription may be required): https://www.nytimes.com/2019/03/19/business/myspace-user-data.html

Interesting AI & Immune System Initiative by Microsoft and Adaptive Biotechnologies that leaves behind some questions

on February 10, 2018

by Srihari Yamanoor

It is no secret that the larger technology companies including Google, Microsoft, nVidia, IBM, Apple and others want to dominate AI, as well as healthcare, in an ever expanding competitive landscape. While it is anyone’s guess if they will succeed, or get upstaged by smaller, nimbler firms in either arena, the moves they make are definitely interesting to watch. A lot of the moves appear benign, but could lead to cannibalization, such as the “AI Contests” some of the organizations put up (more posts to come on this).

Partnerships can go both ways I suppose, and are probably a strategic way to externalize any risk of failure. In that sense, in the current example, both Microsoft and Adaptive Biotechnologies appear to want to play up their strengths. The premise of what they want to do with the partnership is also quite intriguing. You can read it from the horses’ mouths in the links provided below. I will summarize them and lay out a couple of thoughts that come to my mind.

Essentially, the project would turn the body’s immune system itself into the data source for diagnosis. Because every time the immune system responds to a disease, T-Cell receptor (TCR) proteins are expressed to combat antigens. Mapping the TCRs, through a simple blood test, as Microsoft and Adaptive postulate, can go a long way in early diagnoses of an array of diseases. To say the least, the project is ambitious, and here are a few thoughts:

Accurate diagnosis and personalized therapy require knowledge of the state of the human body and its disease. Simply mapping the genetics of a person, and considering their epigenetics and their lifestyles, etc. is complex enough, but it still might not be enough! Thus, TCRs could be mapped and allow for quicker diagnoses, if the theory pans out on a large-scale. It is yet unclear to me that mapped TCRs can actually yield the necessary diagnostic clues, machine learning or not, for a larger variety of diseases. However, it might supplement diagnostic efforts alongside genetics, epigenetics and other health data sources.
From a business angle, I also find this to be intriguingly different from the general bedlam of text processing a la Watson, and all the algorithms rushing to read and reinterpret imaging as with nVidia and others. Microsoft has appeared to have looked for and found a partner with a unique approach to the application of machine learning in healthcare.
Any large set of unknown targets, powered by data might appear to be a classic problem for machine learning to solve. However, Microsoft and Adaptive (Microsoft has invested in them now, apparently) might have joined a Kool-Aid club that bridles the horses behind carts. What I mean when I array out those cliches is this: medicine already has a problem of knowledge paucity when it comes to diagnoses, until more clarity becomes available, by way of a progressed disease. This is fundamentally because disease precursors are poorly understood, from want of clinical research, not lack of intelligence. What is to say the TCRs won’t just set off an array of false and confounding alarms? Yes, with liquid biopsy and other such hyped up methods out there, the industry is in a rush for quick fixes. It might well turn out that this is much harder to resolve, with clinical studies and protocols that will require to demonstrate that TCR expression, their proportional presence, etc., do truly indicate the preliminary stages of a disease being present in the body. I am not convinced yet.
There is also a maddening rush out the gate to define universal tests with pinpricks of blood. While I am not suggesting we practice good old, barbaric bloodletting (although there are all kinds of people out there “thirsting” for a comeback to this practice), I think this is an unnecessarily over-constrained problem definition, perhaps making titillating fodder for press releases and blog posts. There might also be an urge to combine these pinprick tests with diabetes monitoring and such. While it is tempting to fantasize about such possibilities, and at some point, these might come to fruition, there is no need to go to such extremes before solving fundamental problems in medicine – accurate diagnosis and targeted therapy. For example, when should a person’s blood be drawn? How frequently? Would the frequency vary when a certain set of TCRs are observed? There are so many things to worry about here. I would think companies would stop using overly broad terms such as “universal”.
In my posts here, in my talks, daily discussions and so on, I always come back to a few bugaboos. Who will own the TCR mappings? Who owns the product of the machine learning algorithms? Will they be patented and bridled off? How will such diagnostic methods be regulated? Validated? Will Explainable AI, something I expect to be a fundamental principle that should be applied to healthcare be required (see explanation from DARPA linked below) and used judiciously? And on and on we go.
Data has been walled off quite well in the healthcare industry up to this point. Yes, we got the human genome, but much much more sits behind curtains and masks and other cliches you can think off, that every new technology that promises to expose and dig through data always concerns me, surrounding ownership.
The “Theranos” Effect: If you are like me, and know about the story of Theranos, you are still sitting up at nights, jaws dropped, wondering how in the hell, the company is still in vogue (I have written about this on my medical devices blog, in fact, using the same Pixabay image! See link below). I have also linked, one of several dozen well-written write ups that offer us a tale of caution, and I plan to call this, as I have named it, the “Theranos” Effect. In summary, this company went into the “over promise, and extreme under-delivery” (or never delivery, to date..) business. They engaged in egregious and unethical business practices, fooling the industry, investors, partners and more, along the way. How do we make sure that with all the promise of AI, companies don’t make such ugly incidents repeat themselves? Mind you, this is not me pointing fingers at Microsoft. I think this is indeed a great effort. I am just offering this up as an important tale of caution, for people in healthcare, and in any industry for that matter. I understand, as much as anyone else that businesses need hype to push their products. However, it would behoove you to make sure you don’t push things off a cliff…

CONCLUSION

In mankind’s march towards the goal of a healthy future for all, we have many strides to make. Naturally, we want to be as accurate and as thorough, yet economic as we can. Therefore, we rely on technological breakthroughs on one end, such as anything ranging from improvements in basic science, to sensors and AI, and on the economics of lower thresholds for test materials consumed, time to diagnosis and other aspects on the other end. What Microsoft and Adaptive aim to do with their (investment based) symbiotic looking partnership is commendable. It may take us one step closer to our goal, but it may not be the one to take us there at all. Only time can tell, and in the meanwhile, I hope commonsense and ethics prevail over hype and fantastic marketing materials.

Should you have something to add, please leave a comment below.

Subscribe and Support, Please!

Did you enjoy this post? Please subscribe for more updates, using the sidebar. Have ideas or blog posts you’d like to see here? Contact me at yamanoor at gmail dot com.

References:

The Microsoft Blog Post: https://blogs.microsoft.com/blog/2018/01/04/microsoft-adaptive-biotechnologies-announce-partnership-using-ai-decode-immune-system-diagnose-treat-disease/?imm_mid=0fa701&cmp=em-data-na-na-newsltr_ai_20180115
Adaptive’s Press Release: https://www.businesswire.com/news/home/20180104005464/en/Adaptive-Biotechnologies-Announces-Partnership-Microsoft-Decode-Human
DARPA, on Explainable AI: https://www.darpa.mil/program/explainable-artificial-intelligence
Vanity Fair on Theranos: https://www.vanityfair.com/news/2016/09/elizabeth-holmes-theranos-exclusive
Myself, writing with incredulity on Theranos’s longevity: http://chaaraka.blogspot.com/2017/12/theranos-lives-to-die-another-day.html
Image, Courtesy, Pexels+Pixabay: https://www.pexels.com/photo/white-and-clear-glass-syringe-161628/

Machine Learning shows promise in Dementia prediction

on January 12, 2018

by Srihari Yamanoor

Rubik's Cube

I was flipping through some archives, and found this Scope (a Stanford University, School of Medicine Publication) article that delineates a machine learning tool (link below, Scope calls it AI, the authors of the tool, in their paper, also linked below, rightfully categorize it as ML, a subset of AI as we generally describe it). I always love when you have access to the paper linked to a study. It always makes things easy.

So, the folks at McGill, trained an ML system using PET scans from people who demonstrate symptoms of mild cognitive impairment, to see who among them would develop Alzheimer’s, given that not all of them do. They taught the system to focus on the elevated protein expression in specific brain regions to train and make predictions.

Used on an independent set, the tool had an 84% prediction accuracy of dementia progression. Read more in the paper. I want to share a few thoughts below.

I think tools like this will become the norm over time. However, right now, they lack the kind of standardization and maturity required for integration into clinical practice. I don’t mean to state that in a negative sense. Such efforts take time, effort and funding, of course.
An 84% percent efficiency is not enough, not even for a supporting tool, not even when humans are completely in charge. This is also achieved through training with large data sets, the use of better algorithms and other improvement methodologies. This could also use some standardization, that can then be spread to all ML, DL and AI tools, which use imaging for diagnostics in healthcare.
The future should consist of such tools passively (and when necessary, actively) siphoning your imaging and other data off your EHRs, and then parsing them to see if predictions can be made. This however, requires more groups such as the ADNI (Alzheimer’s Disease Neuroimaging Initiative), from whose participants the imaging and other data was used, collaborations from hospitals, insurance companies and governments.
To improve diagnosis across ages, sexes, races and other discriminating factors, global co-operation would be required.
Of course, we need to take various types of data, ranging from imaging to genetics, to epigenetics and other sources to make diagnosis quite efficient. Perhaps, this combination is one way to get around the 84% efficiency in this tool, till a time comes when imaging alone produces better results. At that point, say you make predictions based on imaging, genetics, lifestyle and other factors, and they all chime in. You can probably use whatever interventions are available (this is a key factor, missing in all the hype about machine learning. You learn something, yes, but what do you DO?) to delay, treat and cure patients.

When I find more such interesting studies, I will share similar and other thoughts on Machine Learning, Deep Learning and AI, and their impact on Healthcare.

Subscribe and Support, Please!

Did you enjoy this post? Please subscribe for more updates, using the sidebar. Have ideas or blog posts you’d like to see here? Contact me at yamanoor at gmail dot com.

References:

The Stanford SCOPE Article: http://scopeblog.stanford.edu/2017/08/29/artificial-intelligence-can-help-predict-who-will-develop-dementia-a-new-study-finds/?imm_mid=0f5d75&cmp=em-data-na-na-newsltr_ai_20170904
The McGill Paper: http://www.neurobiologyofaging.org/article/S0197-4580(17)30229-4/fulltext
Image courtesy, Pexels: https://www.pexels.com/photo/brain-color-colorful-cube-19677/

Category Archives

The importance of providing Labels and Structure to your Healthcare Data towards AI Success

Interesting AI & Immune System Initiative by Microsoft and Adaptive Biotechnologies that leaves behind some questions

Machine Learning shows promise in Dementia prediction

Date and time

Tags