ILLAWARRA MEDICAL: Instrumenting a health network for large-scale medical data gathering.



Two of the biggest challenges facing the medical community relate to data and process. We need to understand how to:

  • Gather and leverage healthcare data in a ubiquitous fashion
  • Manage and improve healthcare processes in a standardized fashion

Specifically, the aims of this project are:

  • To develop a framework for large-scale ubiquitous process automation within a geographically defined health service area
  • To devise the instrumentation required for pervasive healthcare data gathering
  • To implement a range of data analytics to leverage the data collected 
  • To develop a “packaged” collection of software tools and methodologies that can be commercially offered to other area health services


Gathering medical data on a large scale and in a ubiquitous fashion offers several benefits:

  1. The possibility of implementing sophisticated data analytics, leading to the discovery of patterns leading to potentially critical medical and clinical insights.
  2. The potential for generating alerts when certain patterns are observed in patient-specific data instances.
  3. The potential for monitoring efficiency and efficacy of health service delivery within a health network.
  4. The potential to feedback the results of data mining to individual patients

An infrastructure that supports pervasive medical process management within a health network provides significant benefits:

  1. The ability to implement standardized and best-practice processes at all levels.
  2. The ability to dramatically improve efficiency via process automation across primary, secondary and tertiary providers.
  3. The ability to assess and manage compliance with regulatory requirements in a pervasive fashion.
  4. The ability to incorporate decision-support functionality at key points within these processes.
  5. The ability to instrument mechanisms for quality assuring a range of medical/healthcare decisions.
  6. The ability to include the patient in decision making and healthy habit development for health.

While most of the utilised methodologies as well as technologies are well developed within their specific discipline, such large-scale medical data gathering, analytics and mining approach for the translation of informatics to medical application inherently poses many challenges.

  1. Instrumenting an geographical area’s health network for pervasive medical/health data collection has never been done before. This project will develop the architecture for such a network-wide data collection infrastructure. It will implement the infrastructure in phases and, through a series of pilots, establish that the collected data does deliver on the value propositions listed above.
  2. There are several key functional and non-functional requirements for such a system. 
    1. The data collected must be tagged with rich semantics. Thus, a pathology test result must be temporally and conceptually correlated with a prior or subsequent diagnosis, the initial triggers for the test (such as a set of symptoms presented, or a mandatory step in a given treatment protocol), the subsequent treatment steps, outcomes and so on.
    2. The infrastructure must be compliant with stringent security and privacy regulations.
    3. Data collection must be minimally obtrusive.A key challenge is juxtaposing the data collecting with the right semantic context. While a pathology lab may, for instance, supply data tagged with the appropriate patient identifier and referral information (referring physician, date and so on), such data must also have a meaningful description of context.
  3. At present, it is difficult to answer questions such as: Why was the test requested? What diagnosis, if any, did the test results lead to? What stage of a treatment protocol (if applicable) was the patient on when this test was conducted? What were the clinical outcomes for this patient? We propose to drive data collection via a comprehensive framework for medical/health process management.
  4. At present, there exists a plethora of proprietary EMR products deployed and used GPs. While these systems are customisable, the degree of customisation that a particular company can sustain for the purposes of large-scale data gathering and research is small. The changes that they do drive are commercially driven and may produce no benefit in terms of data mining potential.


This project will address a range of open questions:

Scoping and architecting ILLAWARRA MEDICAL:

Rapid process discovery for healthcare processes: A key problem in deploying process automation within an area health network is documenting and modeling the very large number processes involved. Normative/best-practice guidelines exist in some cases. For instance, a large number of clinical protocol guidelines are documented at the U.S. National Guidelines Clearinghouse.

Some commercial vendors sell normative healthcare process models (e.g. Map of Medicine). Between them, these sources cover only a small fraction of the healthcare processes of interest. Traditional business analysis techniques and “from-scratch” process modeling techniques are unlikely to be of use, given the scale of the process modeling challenge, and the speed at which these processes must be modeled and deployed. In our earlier work on a Rapid Process Discovery toolkit [REF: RPD], we have developed machinery that extract “proto-process models” by mining legacy text (e.g. memos, procedure manuals etc.) and model artefacts. This is an approach that could be leveraged in this context.

Fast process instrumentation: Process instrumentation is critical to the research vision. We use the term to denote the use of process engines as coordination tools in the execution of healthcare processes. We do not assume that healthcare processes can be automated in the traditional sense, since the vast majority of the process steps will be practitioner-mediated. A coordination engine ensures that the requisite set of steps are executed (by human or machine) and in the right order, following the correct execution logic. Off-the-shelf process engines can be used, but a small modicum of tool customization is required. Given the scale of the task, novel methodologies are required that can roll-out process instrumentation in a standardized, quality-assured and minimally intrusive fashion.

Data architecture: A comprehensive architecture for a data repository of the kind being targeted here has never been designed. The repository will make provision for both data in current use as well as data that would be migrated to a data warehouse. 

The data architecture would need to support a range of sophisticated queries, as well as OLAP-style analytics. The repository would also need to be designed with the application of a range of data mining and machine learning algorithms in mind. The use of new approaches such as tag-based databases should be considered.

Data analytics: We need to support three kinds of analytics:

  • Data monitoring: A configurable machinery could be used to institute patient-specific monitors that would use complex event processing technology to analyse patient-specific data streams (consisting, for instance, of a sequence of reports from GP visits, a sequence of pathology tests, scans, diagnoses, medication and other treatment instances and so on) to generate alerts for potential conditions that might have otherwise been missed.
  • Data mining/machine learning techniques: In an anonymized form, a large data warehouse of health data would provide a rich lode of patterns and generalizations that could be extracted by the application of data mining and machine learning techniques. A key challenge is to define a feasible subset of the repertoire of available data mining/learning techniques that would be integrated/deployed in the system.
  • Process analytics: The rich body of data acquired could also provide a useful basis for understanding the effectiveness of the healthcare processes in place. This data could be used to generate process “heat-maps” that would help us understand the components of these processes that require better resourcing. This data would also provide process performance measures that could be used to re-design/optimize these processes.

Data analytics: It is impossible to describe the data mining to be undertaken until the data acquired is known, but through collaboration with the primary, secondary and tertiary health providers, vendors and financial institutions who are used by patients, it is intended that substantial lifestyle data will be collected.

Privacy and security:

Change management strategies: A socio-technical system of this size and complexity will need to be frequently updated to reflect changes in the underlying reality. It is critical that we avoid an operating model where every change in the structure and operations of the area health network requires significant re-implementation and personnel-intensive system updates. The solution is to devise methodological and (in some instances) tool support for propagating changes in the real-world context through the system. This would require a set of change deployment strategies custom-designed for an (anticipated) repertoire of changes. More complex machinery for handling un-anticipated changes and exceptions could also be deployed. Ultimately, this would lead to a complete governance structure for systems of this kind.

Communication with the broader community:

A “shrink-wrapped” toolkit and methodology: A critical commercial outcome is the ability to roll-out the same project in other area health services. This requires that the tools and methodologies developed be packaged in a manner that clients and purchase and deploy in their own contexts with relative ease. The tools and methodologies should be licensed as open source so that further development can be undertaken by other users to suit their needs.



Task 1: Develop an infrastructure for collaborative agent-based optimization.