From: John G. Sotos
To: AI-MEDICINE@medmail.Stanford.EDU
Date: Tue, 12 Dec 1995 04:43:37 -0500
How, if at all, should the U.S. Food and Drug Administration regulate the marketing of knowledge-based medical computer systems? This topic came up at a workshop, "Evaluation of Knowledge-Based Systems," held at the National Library of Medicine last week. The topic is timely because the FDA is going to be holding two public meetings in 1996 as part oftheir continuing effort to formulate a policy for regulation of softwareproducts and computer-controlled devices. Since the first of these meetings is in January and since the FDA wants to hear from academia and industry, it makes sense to start batting some ideas around now.
I'll go first and lob a juicy one for people to swing at.
The workshop discussed many problems associated with evaluation of knowledge-based systems, specifically, that they are expensive, draining, and very hard to do right (as if anyone knows how to do them right). There were concerns that regulation by the FDA, if modelled after the severe way it regulates drugs and medical devices, would impose a burden that would choke off this industry just as it's getting started. This view would argue that regulation is best performed by the marketplace. Others, however, put little faith in the marketplace, since it can be slow and since correctness is not necessarily one of its criteria (e.g. there are books still alive in the marketplace that claim foot massage can cure diabetes).
I think there is a middle ground: require all medical software to bear some labelling, and use the FDA to enforce truth-in-labelling. The FDA could require every piece of medical software (and every computer-controlled device) to carry a label that tells the extent towhich the system has been evaluated. An "evaluation rating" of _Zero_ would mean that no formal evaluation of the system has been performed. The FDA should give the _Zero_ label away for free: that is, the FDA should approve any system submitted to it, so long as the system bears the _Zero_ label. To receive a higher rating (_One_, _Two_, etc.), the manufacturer would evaluate the system in a formal study, then submit the results to the FDA. This could happen before or after marketing begins. The FDA review panel would then examine both the outcome and the quality of the evaluation study's design in deciding what rating the system deserves. Factors like randomization, blinding, and hidden biases would be considered in assessing the study's quality.
In essence, the FDA's review boards would change from the life-or-death Roman Emperor model (thumbs up/thumbs down) to the free-speech Motion Picture Rating model (G, PG, PG-13, R, NC-17, X). Critical, "high stakes" software systems, such as pacemaker and defibrillator controllers, should, however, continue to be held to the high standard in use today.
This scheme has advantages, challenges, and disadvantages.
Anything can come to market and it can come to market fast, but the customer is made aware of exactly what evidence stands behind the program. There will simultaneously be a huge incentive for companies to do the evaluation studies and move up the rating ladder, since liability concerns will surely dissuade physicians from using level _Zero_ systems. ("So, Dr. Osler, tell us again why you used the BrandX device on poor Mrs. Jones, when it was there in big letters that the device had never been formally evaluated?!?") But, if a _Zero_ product truly is innovative and there is no alternative available that has a better rating, or no alternative is available at all, then Dr. Osler can at least assemble a credible defense.
This challenge should attract people in our community. Of note, software systems have already been devised that assist with the design of studies and that also rate study designs, so this is not where the difficulty is. A December 1975 paper in _SIGART Newsletter_ by Shortliffe & Davis (discussed at the workshop) delineates eight stages of evaluation for a knowledge-based systems, but it's unlikely the answer will be that easy. I think everyone at the workshop recognized that a randomized, blinded, prospective study yields the best data, but everyone also cringed at the amount of effort and the amount of resources it takes to conduct such a study, and recognized that other study designs can bevery informative.
Aside: How would this scheme impact the academic world?
Workshop participants remarked that NIH study sections often demand evaluation studies almost to the exclusion of development work! Because the rating scheme applies to commercial-grade software, it would be unfortunate for the same ratings to find their way into grant reviews. Fractional ratings at the bottom of the rating ladder may provide sufficient resolution for the progress of evaluation in the academic world (e.g.0.2, 0.6, etc.). |
If this is true, then we can expect the software label to ultimately resemble a food label, with separate ratings for the analogs of fats, proteins, sugars, etc. The fewer the scales, the better. In a similar vein: should there be separate scale ratings reported for when evaluations are performed with novice users and for evaluations conducted with experienced users (or other well-defined sub-populations of users)? If so, companies with large market share will enjoy a sustained ratings advantage, since there will be more subjects available for study in the "experienced" group.
If, for example, I change 40% of the rules in my product's KB (knowledge base) during the change from version 2.0 to version 3.0, should I have to surrender the lucrative _Three_ label that I spent millions of dollars earning just last year?
With no financial barrier to entering the market, small companies can, with their earnings on the level _Zero_ version ofthe product, pay for the studies needed to move up the rating ladder. Unfortunately, a significant competitive advantage will accrue to large companies with deep pockets who will be able to finance evaluation studies earlier than the small companies. This can, perhaps, be circumvented by placing an onerous tax on products when higher level ratings are granted to big companies (Rush Limbaugh will go crazy) or, as the FDA is soon to do with medical devices, charge an up-front fee for its reviews, with the fee determined by company size.
The FDA has really started pushing its "Medwatch" program, which is designed to collect reports of adverse events from physicians. Because all medical software will be registered with the FDA as part of granting the _Zero_ label, the FDA will be able to set up a similar Medwatch program for software. The FDA could even require a "Medwatch Module" in medical software so that users could, with one mouse-click, be connected to the FDA home page where the adverse event report would be solicited. (One can imagine the FDA being buried with reports of inane user mistakes, but it's unlikely the FDA will trust manufacturers to receive reports and forward the significant ones to the FDA.) Perhaps there should be a provision for decrementing a product's rating if there are too many adverse events reported.
Mandatory, easy-to-understand ratings that have the force of the FDA behind them will make the medical software marketplace frighteningly competitve. It will be like having the Consumer's Report rating on your package. Product lines will live and die by these ratings. If you are an executive, stockholder, or employee of a medical software company, your parietal cells will be pumping acid like there's no tomorrow every time there is a rumor of a competitor beginning an evaluation study.
****
It's my impression that the FDA wants to Do the Right Thing here and is genuinely interested in what we have to say. Does anyone know if AMIA (American Medical Informatics Assn) is going to be involved inthe FDA process on a formal level?
John Sotos