Top

Journal Club

Split Viewer

Mol. Cells 2022; 45(9): 620-621

Published online September 30, 2022

https://doi.org/10.14348/molcells.2022.0119

© The Korean Society for Molecular and Cellular Biology

Data Speak How to Treat Disease

Big data-based precision medicine

Daehee Hwang *

School of Biological Sciences, Seoul National University, Seoul 08826, Korea

Correspondence to : daehee@snu.ac.kr

Received: July 28, 2022; Accepted: August 2, 2022

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/.


Schematic overview of big data-based precision medicine platform.

As vast amounts of diverse data have been accumulated in public repositories (Athar et al., 2019; Barrett et al., 2013; Tryka et al., 2014), it has become possible to decode relationships and rules that the data contained. For instance, you will see a list of “people you may know” whenever you open the Facebook application on your mobile. People with significant numbers of common acquaintances with you are chosen to be added to the list based on the social network in the Facebook database. Therefore, the social network data relationships appear pretty accurate. Moreover, people check whether a specific English description is grammatically correct by googling the description. Grammatical correctness of the description can be then determined based on how often the particular description was used in the documents, assuming that the frequently used descriptions are grammatically correct.

Numerous scholars are attempting this big data-based approach to foretell optimal therapeutic options for patients with refractory diseases. Consider that you have gathered the following data for 1 million patients with nonsmall cell lung cancer: 1) electronic medical records; 2) lifestyle information (foods, alcohol, smoking, exercise, etc.); 3) DNA mutations; 4) levels of cancer-related mRNAs, proteins, metabolites, and posttranslational modifications in lung tissue, ascites, blood, and urine samples; and 5) image data from magnetic resonance imaging, computed tomography, and ultrasonic measurement. You then deposited all these data to a database with exploration and search tools. Patients with advanced nonsmall cell lung cancer for whom an optimal therapeutic option could not be determined are then searched against this database. According to the medical, clinical, molecular, and image data of the database, the search will result in the top 100 patients with the most similar characteristics. Finally, an optimal therapeutic option can be determined as the one that demonstrated the best prognosis among therapeutic options employed for the top 100 patients, assuming that the chosen therapeutic option would work best because it worked well for similar patients. Numerous industrial and medical sectors begin to employ this big data-based precision medicine platform.

Although this platform sounds promising, it also has several issues that should be addressed prior to its application in clinical settings. Similar to Google that does not explain why an English description is correct or incorrect when you check the grammar of the description using Google, this approach does not explain why the therapeutic option should work best clinically or mechanistically. Whether the predicted therapeutic option is good or not should be evaluated by clinicians’ experience with and knowledge of the disease. Furthermore, it is unclear how similar top 100 patients should be chosen because the relative importance of the medical, clinical, molecular, and image data is unknown when examining the similarity between patients. Data scientists are currently developing methods to solve these problems to enable the practice of the big data-based precision medicine approach in real clinical settings.

The author has no potential conflicts of interest to disclose.

  1. Athar A., Füllgrabe A., George N., Iqbal H., Huerta L., Ali A., Snow C., Fonseca N.A., Petryszak R., and Papatheodorou I., et al. (2019). ArrayExpress update - from bulk to single-cell expression data. Nucleic Acids Res. 47(D1), D711-D715.
    Pubmed KoreaMed CrossRef
  2. Barrett T., Wilhite S.E., Ledoux P., Evangelista C., Kim I.F., Tomashevsky M., Marshall K.A., Phillippy K.H., Sherman P.M., and Holko M., et al. (2013). NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Res. 41(Database issue), D991-D995.
    Pubmed KoreaMed CrossRef
  3. Tryka K.A., Hao L., Sturcke A., Jin Y., Wang Z.Y., Ziyabari L., Lee M., Popova N., Sharopova N., and Kimura M., et al. (2014). NCBI's Database of Genotypes and Phenotypes: dbGaP. Nucleic Acids Res. 42(Database issue), D975-D979.
    Pubmed KoreaMed CrossRef

Article

Journal Club

Mol. Cells 2022; 45(9): 620-621

Published online September 30, 2022 https://doi.org/10.14348/molcells.2022.0119

Copyright © The Korean Society for Molecular and Cellular Biology.

Data Speak How to Treat Disease

Big data-based precision medicine

Daehee Hwang *

School of Biological Sciences, Seoul National University, Seoul 08826, Korea

Correspondence to:daehee@snu.ac.kr

Received: July 28, 2022; Accepted: August 2, 2022

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/.

Body

As vast amounts of diverse data have been accumulated in public repositories (Athar et al., 2019; Barrett et al., 2013; Tryka et al., 2014), it has become possible to decode relationships and rules that the data contained. For instance, you will see a list of “people you may know” whenever you open the Facebook application on your mobile. People with significant numbers of common acquaintances with you are chosen to be added to the list based on the social network in the Facebook database. Therefore, the social network data relationships appear pretty accurate. Moreover, people check whether a specific English description is grammatically correct by googling the description. Grammatical correctness of the description can be then determined based on how often the particular description was used in the documents, assuming that the frequently used descriptions are grammatically correct.

Numerous scholars are attempting this big data-based approach to foretell optimal therapeutic options for patients with refractory diseases. Consider that you have gathered the following data for 1 million patients with nonsmall cell lung cancer: 1) electronic medical records; 2) lifestyle information (foods, alcohol, smoking, exercise, etc.); 3) DNA mutations; 4) levels of cancer-related mRNAs, proteins, metabolites, and posttranslational modifications in lung tissue, ascites, blood, and urine samples; and 5) image data from magnetic resonance imaging, computed tomography, and ultrasonic measurement. You then deposited all these data to a database with exploration and search tools. Patients with advanced nonsmall cell lung cancer for whom an optimal therapeutic option could not be determined are then searched against this database. According to the medical, clinical, molecular, and image data of the database, the search will result in the top 100 patients with the most similar characteristics. Finally, an optimal therapeutic option can be determined as the one that demonstrated the best prognosis among therapeutic options employed for the top 100 patients, assuming that the chosen therapeutic option would work best because it worked well for similar patients. Numerous industrial and medical sectors begin to employ this big data-based precision medicine platform.

Although this platform sounds promising, it also has several issues that should be addressed prior to its application in clinical settings. Similar to Google that does not explain why an English description is correct or incorrect when you check the grammar of the description using Google, this approach does not explain why the therapeutic option should work best clinically or mechanistically. Whether the predicted therapeutic option is good or not should be evaluated by clinicians’ experience with and knowledge of the disease. Furthermore, it is unclear how similar top 100 patients should be chosen because the relative importance of the medical, clinical, molecular, and image data is unknown when examining the similarity between patients. Data scientists are currently developing methods to solve these problems to enable the practice of the big data-based precision medicine approach in real clinical settings.

CONFLICT OF INTEREST

The author has no potential conflicts of interest to disclose.

Fig. 1.Schematic overview of big data-based precision medicine platform.

Fig 1.

Figure 1.Schematic overview of big data-based precision medicine platform.
Molecules and Cells 2022; 45: 620-621https://doi.org/10.14348/molcells.2022.0119

References

  1. Athar A., Füllgrabe A., George N., Iqbal H., Huerta L., Ali A., Snow C., Fonseca N.A., Petryszak R., and Papatheodorou I., et al. (2019). ArrayExpress update - from bulk to single-cell expression data. Nucleic Acids Res. 47(D1), D711-D715.
    Pubmed KoreaMed CrossRef
  2. Barrett T., Wilhite S.E., Ledoux P., Evangelista C., Kim I.F., Tomashevsky M., Marshall K.A., Phillippy K.H., Sherman P.M., and Holko M., et al. (2013). NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Res. 41(Database issue), D991-D995.
    Pubmed KoreaMed CrossRef
  3. Tryka K.A., Hao L., Sturcke A., Jin Y., Wang Z.Y., Ziyabari L., Lee M., Popova N., Sharopova N., and Kimura M., et al. (2014). NCBI's Database of Genotypes and Phenotypes: dbGaP. Nucleic Acids Res. 42(Database issue), D975-D979.
    Pubmed KoreaMed CrossRef
Mol. Cells
Nov 30, 2022 Vol.45 No.11, pp. 763~867
COVER PICTURE
Naive (cyan) and axotomized (magenta) retinal ganglion cell axons in Xenopus tropicalis (Choi et al., pp. 846-854).

Share this article on

  • line
  • mail

Molecules and Cells

eISSN 0219-1032
qr-code Download