The Microsoft Octopus and our Health Data

Big Data, Algorithm and Artificial Intelligence, a collective of caregivers and computer scientists oppose the government.
Under the guise of fighting the epidemic and declaring a health emergency, the government has kicked off its mega health data platform project hosted by Microsoft. A small dive into artificial intelligence in medicine with the inter-hop collective that has been campaigning against this giant platform and for an autonomous use of health data on a human scale since the beginning of the project.

This article is a collaboration with the newspaper Lundi Matin. You can find the original article by following this link. Thanks to them.

The law of July 2019 “relating to the organisation and transformation of the health system” includes a section on “the digital ambition in health”: the text explains that it is necessary to “fully deploy” “telemedicine” and “telecare” and creates a “Health Data Platform” responsible for “gathering, organising and making available” health data from various existing files. The idea is to set up a huge warehouse of data produced by organisations working in the healthcare sector in order to feed and develop algorithms, i.e. to literally explode the capabilities of artificial intelligence (AI) in the field of health.

This Health Data Platform, known as the “Platform”, was created precisely on the recommendations of a report by deputy Villani which, in March 2018, aimed to “position France at the forefront of AI”. Health was one of the five sectors to be “invested” as a priority, alongside education, agriculture, transport and armament. In a state-of-the-art novlanguage, the parliamentary report urges “public authorities” to adapt very quickly “otherwise they will be powerless to witness the complete reformulation of public health issues and medical practices”. Big deal!

The word Platform, used by the protagonists of this scam, actually means two very different things: the IT solution for storing and using data, and the public-private consortium - legally called a “Public Interest Group” - set up specifically to implement and orchestrate the technical solution. To further blur the lines, the texts explicitly state that the “Platform” must be called the Health Data Hub HDH “in the context of its international communications”, the first serious indication of American commercial interests in this matter.

Since the law of July 2019, which thus accelerated the creation of the public-private consortium, it is essentially ministerial decrees and totally opaque choices that gradually define the outlines of the legal and IT system of the HDH. Those who, from the outset, opposed this project to centralize the processing of health data were not disappointed: first of all, it is the Microsoft cloud that was chosen as a subcontractor to provide the servers to store and analyze data files. This infrastructure runs on software whose codes are not public, which makes it impossible to know what is really going on inside the machines; it also complicates the possible future migration to other servers that would not use Microsoft software . Who will be surprised to learn that Microsoft’s designation was in breach of the competition rules applicable to public procurement, as no call for tenders was issued for this IT hosting project? Certainly not Edward Snowden, who has publicly denounced the choice of centralising data hosted by such a mastodon.

It seems that the French government will capitulate to the Cloud cartel and provide the country’s medical information directly to Microsoft. Why? It’s just simpler. Edward Snowden (@Snowden) May 19, 2020

In April 2020, the government took advantage of the state of health emergency regime to bypass the project’s opponents and order the technical platform to be put into operation ahead of schedule. Invoking the urgent need to improve knowledge about Covid-19, the ministers simply ignored the opinion of the French Data Protection Authority called CNIL, which had expressed several serious reservations about the vagueness still surrounding the platform’s computer architecture, which was to host particularly sensitive data. The CNIL was counting on being closely associated with the ANSSI - French Agency for the Security of Information Systems - in the development of the technical component, which is absolutely crucial, but the heads of the public-private consortium decided otherwise.

In application of the law of 11 May 2020 extending the state of health emergency, the data in the new “SI-DEP” (Information and Tracking System) file relating to persons infected by Covid-19 and persons who have been in contact with these persons, are transmitted to Microsoft servers. This sharing of data provided for by the law “without the consent of the persons concerned”, officially “for the sole purpose of combating the spread of the covid-19 epidemic and for the duration strictly necessary for this purpose” may, according to the same article of the law, operate “for a period of six months from the end of the state of health emergency”. And of course, this is what was decided: the official and largely fictitious exit of the state of health emergency on July 10 did not stop the sending of contamination data to Microsoft’s servers.

Of course, this Platform project is not new: it is a law of January 2016 called “modernization of our (sic) health system” which creates the National Health Data System (Système National des Données de Santé SNDS). The official aim is to improve access to health data so that “their potential is used in the best interests of the community”. Behind this file system, there are initially two objectives: to produce data for research and to improve evaluation tools, i.e. budgetary control of health care structures. In practice, it is a boost to the collection and sharing of health files and the move towards what is known as “big data” in health. What is “big data”? It’s a large amount of data that comes from different sources and is aggregated very quickly within a single infrastructure. Because what is called artificial intelligence in its most recent form - which has nothing to do with intelligence - cannot work without a huge amount of data. With the new big Platform, which will eventually centralize dozens of packets of files, the idea is to create an inexhaustible source of “health data” to feed algorithms. Algorithms, what for? What data is stored and processed by Microsoft? How are they anonymized? Who will be able to access them, for what purposes and following what procedure?

But also, how can we think of other ways of organising the computerised processing of health data that would really make it possible to improve the quality of care and the well-being of individuals? Should artificial intelligence, i.e. the algorithmic processing of billions of individual data, be used in medicine? It is around these questions that we met members of the inter-hop collective, which was created against the logic of centralizing health data and for a sharing of knowledge in medical informatics. It brings together computer scientists and healthcare professionals who are in favour of free software and the autonomous use of health data at the local level.

Lundi Matin : The HDH, which was rushed into operation under the guise of a health emergency in April 2020, replaces the “National Institute for Health Data” which until now managed the main health data files. What changes with the new Platform? Let’s start, if you like, with the content. What data will be or are already centralized in the Platform?

Inter-Hop: Originally, when it started operating in 2017, the National Health Data System consisted of three main files, three medico-administrative databases: First, the Health Insurance data, which is basically the file containing information related to the reimbursement of care and drugs operated by the National Health Insurance Fund. So, opposite your social security number, you will find which doctors you have consulted on which days, which medicines you have been taking, since when, etc. Then there are the data that come from the hospitals: as soon as possible after the discharge of each patient, the hospitals have to draw up a “standardised discharge summary” (RSS). This computerized record contains a lot of data, including the patient’s date of birth, gender, postal code, date of entry and discharge and the main diagnosis (diagnoses are coded, coded, according to an international classification of diseases published by the WHO). This large database - oddly called the Programme de Médicalisation du Système d’Information (PMSI) - was until then mainly used for the management control of hospital establishments and it is notably on the basis of this file that the State decides on the amount allocated to each hospital.

Reimbursements, hospital stays and finally, the last large file, the death file, or more precisely the “medical causes of death”: when a person dies, a doctor systematically draws up a death certificate which indicates, among other things, the person’s age, sex, cause, day and place of death. Recently, doctors have been able to write these certificates “online”, on their tablet and even from their telephone. Officially, the innovation is supposed to speed up the “production of alert indicators” and even increase the confidentiality of the data thanks to encryption procedures…

With the law of July 24, 2019, the “National Health Data System” sees its scope considerably enlarged since it is intended to collect “all the data collected during the acts covered by the health insurance”¹. This is quite simply enormous since it covers all the health data of the 67 million people living in France. It is all the clinical data collected by caregivers, pharmacists, and hospitals (entry and discharge dates, diagnosis, treatment administered, results of additional examinations, medical reports, genomics and medical imaging). But also data from research protocols, such as those attached to the CONSTANCE ‘cohort’, - a group of people involved in the same epidemiological study - consisting of 200,000 adults aged between 18 and 69 years².

But that’s not all: for example, one of the first databases to be integrated into the new Platform is the OSCOUR® file (Organization for Coordinated Emergency Surveillance; bureaucrats love acronyms). This database, managed by Santé publique France, an outgrowth of the Ministry of Health, covers more than 80% of emergency visits in France. For each patient admitted to the emergency department, it collects the following information: postal code of residence, date of birth, sex, date and time of entry and exit, duration of stay, mode of entry, origin, mode of transport, severity classification, main and associated diagnoses, mode of exit, destination for transferred or transferred patients. It can be seen that, for this database alone, the information collected is very numerous and accurate.

Mention should also be made of the SI-VIC file officially set up in the aftermath of the 2015 attacks so that, faced with an exceptional situation, the State could quickly count the injured and distribute them as well as possible to hospitals. Theoretically, the file is purely administrative: it contains the surname and first names, nationality, date of birth and sex of the person but should not contain any medical information. However, according to the newspaper Canard Enchaîné of 17 April 2020³, some files of persons admitted to Paris hospitals in 2019, on the fringes of the yellow vest demonstrations, mentioned the nature of the injuries, thus making it possible to identify and, therefore, trace injured demonstrators. This confirms that registration is always a high risk for freedom. And with the new National HDH, we enter yet another dimension, since everything is centralized at Microsoft Azure…

A final word on the files: under the state of health emergency, three new databases have been created around the Covid-19 infection: Contact Covid, STOP-COVID and SI-DEP . Contact Covid, for example, gathers the data collected by the “guardian angel brigades” of the Health Insurance Fund and concerns in particular the identity, contact details and place of work of the persons declared as “contact” by the infected patient. SI-DEP, for its part, gathers the results of biological tests allowing the diagnosis of Covid. These files are incredible threats because they can justify very deep intrusions into our lives, via the access deemed crucial to our medical data.

In principle, on the Platform, all data must be “anonymized” or rather “pseudonymized”. Can you explain this concept of pseudonymisation and how the concentration of so-called pseudonymised data weakens their anonymisation?

A distinction must be made between pseudonymisation and anonymisation. In the field of scientific research, which is supposed to be one of the primary objectives of the Platform, anonymization is not appropriate because the best, and probably the only, way to anonymize data is, by and large, to mix them together in a totally random way. But naturally, if this is done, the data no longer represent reality and their interest in research, particularly in health, disappears completely. Pseudonymization is then a kind of compromise that consists in making certain directly identifying data disappear (surname, first name, social security number, date of birth, postal code, etc.) or replacing them with indirectly identifying data (alias, encryption key).

It is the National Health Insurance Fund that is “responsible” for the operations allowing the connection between the different files (this is called “matching”) and then for the pseudonymisation of the data, which therefore takes place before the files arrive at Microsoft.

The problem is that, in practice, with simply pseudonymized data, it is always possible to trace the identity of the person concerned. For example, if, on the night of 3 December 2019, one or two persons are admitted to the Nantes emergency department for acute appendicitis, even if the OSCOUR file does not contain their name, they could very easily be found by cross-checking the OSCOUR file with the file of the day hospital which received them or with the file of care or medication reimbursements. The University of Leuven and Imperial College London⁴ have shown that 83% of Americans can be re-identified using only three variables: gender, date of birth and postal code, data that are for example compiled in the OSCOUR file. If 15 variables are present, the person can be re-identified in 99.98% of cases.

The digitisation of the world allows each and every one of our actions to be recorded, analysed, evaluated and possibly interpreted. This new giant Platform is designed to interconnect several dozen files and thousands of health data. The more databases are linked together, the higher the risk of re-identification. From now on, people receiving treatment in France, and whose health data will feed the Platform, will still be able to be identified by those who, via Microsoft network administrators, will access the servers. Regardless of whether this is prohibited by law, recent history has shown us that legal texts are not sufficient guarantees to protect our privacy. As soon as an operation is technically possible, it is to be expected that it will be implemented by GAFAMs⁵ or state law enforcement agencies.

Could you explain the role of this famous “Cloud”, these machines that, beyond storing data, read and process it? Microsoft claims that the data is encrypted, but you denounce the fact that the American company will, in any case, have access to the data “in clear”. To what extent will this multinational company be able to stick its nose in and exploit the millions of health data of people living in France?

Data encryption is indeed to be distinguished from pseudonymisation. Here, the two are cumulative: the data stored on the Platform are pseudonymized and encrypted. For each file, pseudonymisation makes the directly identifying information disappear (surname, first name, etc.), whereas encryption is used to make all the data in the file secret, as unreadable, and a key is then needed to decrypt them again. In our case, the CNIL⁶ revealed that the keys to decrypt the files were held by Microsoft. Why? Because Microsoft does not only store the data on its servers. The “Cloud” is also a platform for analyzing and processing data⁷, as became clear from the litigation we led, with other free software supporters, before the Conseil d’État⁸. In this case, we were challenging the April 2020 decree that triggered the accelerated implementation of the Platform. Unfortunately, the Conseil d’État refused to suspend the process, but it nevertheless enjoined the public-private consortium that manages the Platform to provide the CNIL with the elements relating to the pseudonymization processes used. This should theoretically allow the CNIL to verify the level of protection of health data hosted by Microsoft …

The hearing before the Conseil d’État also made it possible to reveal that the Technical Platform uses, for its usual operation, 40 Microsoft Azure software programs. These softwares, these programs, are used to analyse the hosted data, like a huge Microsoft Excel spreadsheet in which a gigantic company would do its accounting calculations.

If it is possible to really encrypt data when it is entrusted to a company solely responsible for hosting it, encryption is no longer possible if the company hosting the data also has to analyze it, put it through the mill of several computer programs. So, in this case, Microsoft necessarily possesses the decryption keys and can easily put its nose in the health data it hosts on its servers. The argument of encryption to protect us from the American giant is therefore null and void.

The project has been criticized by the CNIL and by your association as well because, since Microsoft is an American company, the American authorities could easily gain access to the data stored on its servers as part of a legal procedure. In fact, under the Clarifying Lawful Overseas Use of Data Act (known as the CLOUD Act), the American authorities may, in the context of criminal investigations, require American companies that host and process digital data to access certain data stored in the United States but also abroad.

Furthermore, according to the CNIL, which had access to the contract signed with Microsoft, the document provides for “data transfers outside the European Union in the context of the platform’s day-to-day operation, particularly for maintenance or incident resolution operations”. If we understand correctly, the servers used by the Platform would not all be located in the same place and, in any case, the US authorities could easily access them?

On the Platform’s website, we learn that the Microsoft machines - or servers - hosting the French data are located in the Netherlands. A priori, this is not a problem since this European state is supposed to apply European law on the protection of personal data, which is not perfect but which at least provides a framework for requests for data transfers. Precisely, the provisions of the General Data Protection Regulation (GDPR) prohibit that a court or administrative authority of a non-European country can directly access data hosted in Europe and as such protected by the Regulation (except in the case of a mutual assistance agreement or derogation relating to the vital interest of the data subject). In other words, theoretically, the US authorities do not have direct access to the data located on the European servers, even if the company owning the servers is American. Therefore, strictly speaking, whether the machines are located in the Netherlands or in France does not change anything from the point of view of protection against direct intrusion by the US administration.

That being said, the fact is that since the beginning of the project, the Platform’s managers assured anyone who wanted to hear it that all the data would remain in France⁹.

Let’s say that this little lie only fed our mistrust.

Above all, the hearing at the Conseil d’État showed that Microsoft could not guarantee that the analysis data would remain in France, or even in Europe. And that’s quite logical: the very principle of a “Cloud” is to make hundreds of machines work in a network, and Microsoft has servers all over the world. Health data, which is sensitive data by its very nature, therefore finds itself, in the context of the normal operation of the Platform, to be migrated all over the world depending on the computing power demanded by computer specialists¹⁰. And, from the moment the data leaves European territory, the minimum protection linked to the GDPR evaporates in smoke.

Moreover, in the contract, Microsoft even claims that it can use the Platform data to improve its own artificial intelligence algorithms….

According to the Platform’s president, OVH, Microsoft Azure’s French competitor, is also “globalized” and doesn’t offer better guarantees from a data protection point of view. What do you think about this?

In our view, the least bad solution would have been to choose European companies - which are not affected by possible US injunctions issued on the basis of the Cloud Act - and whose servers, located in Europe, are subject to European data protection regulations.

The Cloud Act applies to US companies within the meaning of US law, i.e. to companies incorporated in the United States under US law (and to companies that are controlled by such US companies), which, to the best of our knowledge, is not the case for OVH. In addition, if the public-private consortium managing the Platform required OVH to ensure that the data centres used are exclusively located in the European Union, the data would in principle be protected by European law and could not be transferred outside the European Union without complying with the minimum formalities provided for by European law. Having said that, OVH is a large company which itself claims to be present in more than 19 countries around the world. As stated in its general terms and conditions, OVH may use sub-contractors - some of which are subsidiaries of the parent company, others of which are companies not belonging to the OVH Group - to assist it in the storage and processing of data. Thus, a large part of the problem with the contract with Microsoft Azure would be the centralization of all data with a single (large) service provider, which, in turn, has potentially deployed its infrastructure across a constellation of states. In other words, we are giving the health data of 67 million (natural) people to a single legal entity that we should trust so as not to use all the potentialities of its machines!

It seems to us that alternatives that are much more in line with real data protection exist. If, in the short term, we stay with the idea of data centralisation, there are already platforms such as Teralab, developed within a research institute of Mines Télécom, which can store and analyse very large quantities of data. Its machines are in Douai and run on open source software, the technical team is in Rennes and the rest in Paris. It already hosts health data. Why didn’t you continue along this path? But, above all, why would you want to gather all the health data of more than 65 million people from a single service provider?

As healthcare professionals, we believe that it can be beneficial to the quality of care to exploit health data. Evidence and data have been at the heart of much research since the late 19th century, and even more so since the development of what is known as “evidence-based medicine” EBM. It is an approach that consists, for the caregiver, in asking a question carefully designed for a given patient, which is then answered thanks to several elements of fact collected, “evaluated” and finally used in the way that seems most appropriate to the particular case. It is in fact a question of bringing together, with the greatest possible discernment, the general and the particular. But to know or at least understand the ‘general’, clinical studies must be carried out systematically. This is what are not very happily called ‘cohorts’ of patients from which researchers extract statistical data. These data are neither treasures nor enemies, they are what they are. To a certain extent, following certain paths, they can help to heal. For example, it takes statistical studies on a large sample of people to evaluate the effectiveness of a drug, which strictly speaking can only vary from person to person, from time to time, etc…. In other words, it can be very useful to generate health data in order to better care for people; but the updating of these data and their uses must be well thought out and organized primarily by the caregivers working hand in hand with the computer specialists. At a minimum, of course, patients must give their consent to the collection of data, but ways should also be found to involve them more in what could be a way of collectivizing their medical experience.

What technical solutions do you think would allow data to be used to better care for people?

Within the framework of our association, we recommend that data storage and processing be decentralized by default. First of all, a decentralized computer system is inherently less fragile. Technically, if one of the files or databases is corrupt, the whole system is not. The fact that hospitals, which produce quite a lot of data, can themselves store it and choose how to analyse it seems to us to be a safer solution against hacking but also better in terms of medical ethics. For example, through its representatives, the hospital’s users’ college will be able to have a say on this or that research protocol.

Is it really in our interest to completely disconnect on the one hand the entities that produce “the” data (hospitals, pharmacies, medical laboratories, doctors…) and on the other hand the structures that host this data for research and organize part of their algorithmic treatments? This is a political question. We believe that a balance must be found: the carefully thought-out cross-referencing of data from different files can certainly make it possible to make discoveries and work on new medical treatments. However, we are convinced that, in practice, it is not medical research and patients who will benefit most from a mastodon-big-data like the Health Data Hub.

In the opposite direction, we need to think about a modular, decentralized and open source architecture with the primary stakeholders: caregivers, patients and those who use medicine. Collectively and continuously, we must ask ourselves what kind of medicine we want and imagine arrangements that allow us to use data here and there to better care for people.

In fact, this is already being done in hospitals, which are already carrying out massive data analysis and AI within structures called Health Data Warehouses. The APHP, for example, stores and processes the data of more than 11 million patients via open source software, one of the objectives of which is to facilitate knowledge sharing between researchers.

With this Platform set up in haste, without consultation, entrusted to Microsoft in violation of the rules for awarding public contracts, we are at the antipodes of a collaborative project. This project, in many ways, is reminiscent of the SAFARI project (for Automated System for Administrative Files and Directory of Individuals) of the 1970s. The idea was to interconnect the bases from several institutions: police, finance ministries, labour ministry, land registry and single social security number. This data centralization project was strongly opposed and never succeeded, but it led to the creation of the CNIL.

In the case of the HDH, the CNIL was largely bypassed and, by choosing an American company, the State decided in addition to potentially depriving us of the already imperfect data protection of European law. The Platform is a bit like our SAFARI of the 2020s, but this time hosted by Microsoft.

Let’s come back to the opportunity, that is to say the usefulness, of this huge database to advance medical treatment and better care for people. In your opinion, is it really necessary to encourage artificial intelligence in the medical field?

What is artificial intelligence? Of course, it has nothing to do with intelligence in the usual sense of the term, as the capacities of a computer are always ultimately quantified and therefore limited. Artificial intelligence is a process of computer processing - based on algorithms - which allows a decision-making process to be delegated to a machine. In its most accomplished current version, AI uses algorithms that incorporate learning rules: as the computer integrates and processes data, it automatically refines the way it processes them. Yann LeCun, inventor of this deep learning method - who now heads the AI research lab at Facebook - explains that it is a class of algorithms that simulate human intelligence. He forgets to specify that, for the learning rules to work and for the machine to eventually spit out statistics, “trends”, it still requires very large flows of data.

The process already works quite well - in view of its own objectives - in many sectors: for predicting consumer behaviour on the Internet, for associating consumers on social networks, for facial recognition or sorting sound archives, for example. But, to date, no AI is treating a patient in any hospital in the world. One day soon, for sure, an algorithm will allow to detect certain tumors on X-rays and, as long as the doctor remains the keystone of the diagnosis, the computer tool will surely be a valuable help.

But, in this case, the way in which the State has literally thrown itself into this project of a gigantic data platform, against the backdrop of a health emergency, is entirely open to criticism. Evidence of a significant health gain between the use of data created and collected at human scales and the analysis of massive data by algorithms is often presented as self-evident, although it remains largely unproven. What exactly will be gained by exponentially multiplying the number and nature of health data collected and processed? Which areas of medicine will particularly benefit from these techniques? It is very difficult to predict this, particularly because the material to be analysed appears to be infinite, indefinite, in any case elusive¹¹.

What is certain, on the other hand, is first of all the risk of data leakage and the associated risk of losing the confidence of patients. If people lose confidence in their doctors, and more broadly in the people who care for them, this could have serious consequences in terms of public health.

Did you know that in July 2017, Google, owner of the DeepMind Health artificial intelligence system, disclosed the health data of more than 1.5 million Londoners? The US company had a contract with the British National Health Service NHS to develop an application to monitor patients with kidney failure, who had not been properly informed about the use that would be made of their medical records¹²…

There is, of course, another major risk attached to this type of Platform: this gigantic mass of data literally constitutes a gold mine in the eyes of all the big companies seeking to develop their artificial intelligence programmes, the main ones being insurers and mutuals. If this data falls into their hands, they will be able to refine their rates according to the risks that this or that category of people presents. Rapidly, the probabilities produced by the algorithms could be used to refuse a patient access to a particular treatment, considering its high cost and the chances of success, which are too slim….

In this section, we argue for a rapprochement between care and research. Medicine is practised in real time: when a patient comes to see you, you don’t want to tell him: “wait 1.3 months for a good algorithm to be determined and for the computer to designate the treatment with the best chances of success”. We doctors and computer scientists want to work together to develop search algorithms that can be used as quickly as possible to treat people. This is made possible by decentralized research, which is carried out from the various hospital data warehouses.

The Platform’s engineers will do exactly the opposite: they will work on data they know almost nothing about. Very far from the centres where the data was collected, they will “clean” it and make it usable by algorithms, some of which may be the exclusive property of Microsoft… Moreover, even if a Platform researcher comes to the conclusion, thanks to his computer program, that a given subject should benefit from a given treatment, it will be very difficult to really make him benefit from it since he has no direct access to patients. In other words, the further we move the data processing and care structures away, the more the therapeutic benefits become, let’s say “abstract”, distant. This is another real ethical issue related to this Platform.

That being said, it is true that for certain research projects to be identified on a case-by-case basis, extreme centralisation of data may prove necessary; this is the case, for example, when trying to understand and treat so-called rare diseases¹³. In this particular case, it is interesting to centralize existing data to obtain a critical mass of data on which to reflect and work.

Finally, we must bear in mind that files, in other words medical databases emanating from hospitals, are not, in their current state, thought to adequately feed the huge algorithms of artificial intelligence. These medical files were not designed for research but for care, and the approach is naturally very different. In practice, today’s medical databases are designed so that caregivers have access to all the data about a patient. To do research, one needs access to a specific variable - for example, the person’s age - concerning thousands of patients. It is in a way the reverse and symmetrical approach to care. The most experienced computer scientists believe that it takes at least five years to “qualify” the work carried out by the most advanced hospitals, i.e. to make their health data usable by a giga-plateform, whose large virtual “drawers” are standardized and therefore interconnectable. We therefore have plenty of time to think about and set up decentralised infrastructures that researchers and patients would be able to control.

It’s understandable that behind HDH, there are big financial stakes…

A “mission to prefigure the Platform” was instituted, which was led by a professor of medicine, the president of the National Institute of Health Data (which the Platform was to absorb) and Gilles Wainrib, co-director of Owkin, a big startup in the field of AI in health, financed in particular by the Public Investment Bank and …Google Venture. Before the old institute and HDH merged, the project was led by the Direction de la recherche, des études, de l’évaluation et des statistiques, DREES, a government department then headed by Jean-Marc Aubert, a tireless promoter of the Platform project. The revelations made by the newspaper Le Monde about this high-ranking civil servant are edifying¹⁴: Until October 2017, when he was appointed director of the DREES, Mr. Aubert was employed by the company Iqvia (formerly IMS Health), the world’s largest merchant of health data, as director for “patient solutions” in the United States. As soon as the new platform was launched, our polytechnicien went back to work for Iqvia, this time as the big boss of the French subsidiary… No need to comment.

Finally, concerning the STOP-COVID application, which in any case comes too late to be used against the epidemic, you are sceptical about the reliability of the device from a medical point of view. Why are you skeptical?

For several reasons. Firstly, to be alerted in case of ‘contact’ with an infected patient, you should in principle have stayed 15 minutes at less than one meter from this person. But, for example, in the metro, since masks are compulsory, the distance of one metre and the 15 minutes of contact are irrelevant. Another example, if you are in a room adjacent to the room where a sick person lives, you can be alerted because Bluetooth does not take into account the wall which, however, absolutely protects you from contagion. By the way, Bluetooth is an energy-consuming technology that can drain our batteries so quickly that we all prefer to stop the application so that we can continue to use our phones on a daily basis.

Above all, everyone is well aware of the enormous tracking potential of this type of device. A group of researchers, specialists in cryptography, security or technology law, explain very well the risks during a job interview or when visiting an apartment that we would like to rent¹⁵.

One of the computer researchers of this collective has also shown that the STOP-COVID application stores more information than is officially listed¹⁶. If he was able to realize this, it is because the source code of the application developed by INRIA is entirely open, anyone can access it. The open source code also allows independent computer scientists, specialists in data security, to check that the software offers the security guarantees that patients have the right to expect. This is a further illustration of the need to use software whose codes are fully published. Once again, the data processing software that Microsoft will use to run the Platform will not all be published… Computer specialists will therefore be able to carry out literally secret processing operations….

To come back to STOP-COVID, the failure of the device was, in this case, made official by Cédric O, the former Secretary of State for Digital, since after 3 weeks of activity and 1.9 million downloads, the application had only sent 14 alert notifications. But it is highly likely that this is a trial run mainly intended to familiarize us with tracing and make us believe that we should, at all costs, accept to be traced for the collective well-being.

We believe that, in view of the dangers to freedoms, we must renounce this kind of technology and trust human beings to try by other means to stem future epidemics. Computers, of course, can help us to do many things, but, with political structures as pyramidal as ours, we must be wary of them and fight against uses that feed the desires of all power and control.

Faced with this change in scale, the CNIL has pointed out the difficulties in enforcing, in practice, the two main principles applicable to filing: collecting as little data as possible and limiting the use of data according to the purpose of each file. ↩
Presentation of Proceedings ↩
Read, R. Métairie, with AFP, “What is the Si-Vic platform, implicated in a possible filing of yellow vests? “, Libération, 26 April 2019 ↩
Estimer le succès des ré-identifications dans des ensembles de données incomplets en utilisant des modèles générateurs ↩
Google, Apple, Facebook, Amazon, Microsoft & co ↩
Deliberation No. 2020-044 of 20 April 2020 giving its opinion on a draft decree supplementing the decree of 23 March 2020 prescribing the measures for the organisation and functioning of the health system necessary to deal with the covid-19 epidemic in the context of the state of health emergency ↩
Health Data Hub - FAQ ↩
Council of State, 19 June 2020, Health Data Hub Platform ↩
Health Data Hub: Will our health data be delivered to Americans? ↩
By order of the Council of State, the official website of the Platform now mentions this possibility. If you look carefully, you can read on the website: ‘Upon the order of the administrative judge, you can now, if you look carefully, read on the official website of the Platform: ‘Given the contract signed with its subcontractor and the functioning of the administration operations of the technological platform, it is possible that technical data on the use of the platform (which does not reveal any health information) may be transferred to administrators located outside the European Union’. https://www.health-data-hub.fr/outil-de-visualisation. Heading ‘project directory’. page 127 and go to project number 3173. ↩
For examples of algorithms in medicine, see in particular L. Galanopoulo, ‘Des logiciels experts en diagnostic médical’, in Carnet de Science n°3, CNRS, 2017, p. 103. ↩
Google ‘betrays patient trust’ with DeepMind Health move ↩
An arbitrary threshold admitted in Europe is less than one in 2,000 people affected. https://www.orpha.net/consor/cgi-bin/Education_AboutRareDiseases.php?lng=FR ↩
St. Foucard and St. Horel.” Health data: conflict of interest at the heart of the new Platform”, Le Monde, Dec. 24, 2019. ↩
Anonymous tracking, dangerous oxymoron ↩
StopCovid, the app that knew too much ↩