The effect of Big Data on health care and how data scientists can help

View all blog posts under Articles | View all blog posts under Data Science |

At the start of the 2010s, “big data” became one of the hottest buzzwords in IT. Its sudden prominence in technology journalism, and within fields as distinct from each other as health care and manufacturing, was the product of several converging trends:

  • Starting in the 2000s, virtualization – i.e., the ability to emulate computer operating systems and applications on different hardware than they were originally built for – enabled a shift from on-prem (local) information technology (IT) infrastructure to the vastness of cloud computing and collocated data centers.
  • With this newfound ability to elastically scale their computing power, data storage and IP networking, organizations had what they needed to collect and analyze the massive amounts of information being generated by networked applications. The hope was, and still is, to draw precise actionable insights from these huge datasets.
  • In a late 2016 report, IBM estimated that 90 percent of all the world’s data had been generated within the two previous years. The surge is traceable to the spread of smartphones and tablets, as well as the emergence of the enormous Internet of Things (IoT), which Gartner believes will encompass 8.4 billion connected “things” in 2017.

In industries such as health care, these new realities of information generation and collection offer exciting possibilities – if big data (also known as data analytics) practices can be effectively instituted. Extracting the most value from large quantities of data requires a special combination of mathematical, technical and communication skills, which data scientists can apply to questions such as what underlying genetic and environmental causes connect various diseases. Data science capabilities also enable the long-term implementation of industry-specific IoT applications, more interconnected IT systems and superior value-based reimbursement (VBR) models.

Disease mapping: A case study in big data’s impact in health care

What if a drug designed for the treatment of a specific condition had broader uses? Researchers at the University of Chicago attempted to answer this question by digging into a massive dataset, consisting of 480,000 insurance claims filed by members of 130,000 families.

Their results made them challenge traditional nosologies (i.e, classifications of diseases) by grouping together seemingly unrelated ailments. For instance, they discovered genetic correlation between hypertension and type 1 diabetes, even though the two affect different systems of the body and sometimes have different onset ages.

In medical practice, these big data insights could inform more effective therapies, potentially through the re-purposing of medications originally indicated for something else. There are already numerous examples on this front: The drug Raloxifene was greenlit for treatment of invasive breast cancer in 2007, after it had been initially approved for the prevention of osteoarthritis. Through the study of analytics, data scientists in health care may be able to broaden the scope of such innovations and solve some of the problems associated with gradually slowing drug discovery timelines in recent years (a phenomenon sometimes labeled Eroom’s Law).

Plus, they may help improve the quality – and establish the limitations – of genetic testing for diseases, a technique that has become more popular outside of labs, thanks to direct-to-consumer kits: Another large-scale study, this one of data from 500,000 families in the U.K., revealed that the importance of genetic variation was overestimated by almost 50 percent in the incidence of 12 common diseases. Both patients and providers can benefit from big data insights into how environmental, rather than genetic, causes underpin many widespread afflictions.


A data scientist looks through unstructured data in the form of hanging files in a filing cabinet

What else can data scientists bring to health care?

Beyond this data-driven alignment of diseases, causes and drugs, health care has many other uses for analytics, especially on the IT and administrative sides. Health care is the largest industry in the U.S.: The Centers for Medicare and Medicaid Services estimated that in 2015, per capita spending on health was almost $10,000. At that size, it needs well-designed technical infrastructure to support efficient electronic health record (EHR) management along with better IT system interoperability, both of which are necessary for putting care quality and cost on sustainable courses. Data scientists can contribute by working on:

1. Data analytics for VBR

VBR is the successor to the longtime practice of fee-for-service (FFS). Under FFS, providers are compensated based on the quantity, rather than the quality, of visits. With VBR, payment depends upon the measurable outcomes of care. It sounds simple in theory, but there are many practical challenges, such as what constitutes “value”?

This is where data scientists can help. Analytics might be applied in the tracking of patient histories across multiple provider sites (did their health improve or decline as they visited different locations), or in the comparison of the efficacy of similar treatments within specific populations. Data scientists also help create coherent, unified information repositories that can replace the siloed and ad hoc storage systems currently impeding progress toward VBR, not to mention complicating cybersecurity, regulatory compliance and overall patient safety.

The health care industry accounts for almost one-third of the world’s stored data, according to the Ponemon Institute. But without a data science strategy, this vast sea of information cannot be wrangled into a source of advantage for patients, providers and payers.

2. Machine-to-machine communications in the IoT

In health care, the IoT will incorporate wearable trackers, sensors and cloud-based applications constantly exchanging data with one another. Allied Market Research estimated that $136 billion could be invested in the health care IoT by 2020.

With so much new infrastructure coming online, health care providers will see a surge in the already massive amounts of data under their purview. All this information from the IoT can be useful for health monitoring, coordination of appointment scheduling and wellness initiative/rewards systems, as long as its flow between devices is well-controlled.

Since data scientists have background in computer programming and mathematics, they might work on how IoT data is collected, stored and analyzed. Machine-to-machine communications, between IoT devices, could be a particularly important focus area when optimizing a health care organization’s data analytics processes.

3. Preparing for a career in health care data science

Data science is still in its infancy in health care, but growth will undoubtedly be driven by an aging population, upgrades to existing IT infrastructure and incentives to streamline care costs. At the University of California, Riverside, you can earn a master of science in engineering with a specialization in data science in a year and prepare yourself to become a leader in the growing field of data analysis.


Recommended Readings:

Leveraging Data in your Engineering Career

Most Exciting Career of the 21st Century: The Data Scientist Shortage