echemi logo
Product
  • Product
  • Supplier
  • Inquiry
    Home > Biochemistry News > Biotechnology News > Gao Ge's team proposed a new method for cross-modal representation learning

    Gao Ge's team proposed a new method for cross-modal representation learning

    • Last Update: 2023-02-03
    • Source: Internet
    • Author: User
    Search more information of high quality chemicals, good prices and reliable suppliers, visit www.echemi.com
      

    Biological processes in cells involve DNA, RNA, proteins and other different levels of regulation, which influence each other and work together, so integrating multimodal information corresponding to different omics data is the premise and key to comprehensively characterizing cell physiological/pathological states2
    .

    In recent years, the development of single-cell multiomics technology has allowed biologists to measure different modal/omics information (SHARE-seq3, Sci-Car4, InCite-seq5, 10X multiome) in one cell at the same time, and with the understanding of different modalities of the same system, the understanding of important life processes, such as disease, embryonic development, can be further deepened 6–8
    。 However, compared with previous single-omics techniques, these multiomics techniques are more difficult to apply, more expensive, and the quality of the data obtained is poorer
    .
    Therefore, the development of a computational method to use these single-cell multiomics data as surveillance signals, and the integration of the large amount of high-quality single-modal data currently available, will be of great help to this field (Figure 1)9
    .

    Figure 1 Cross-modal characterization learning in single-cell omics research

    In response to this problem, the team of researchers Gao Ge of Peking University/Changping Laboratory proposed a cross-linked Unified Embedding learning framework under cross-modal representation learning1, and the related papers were accepted by the top conference in the field of artificial intelligence NeurIPS 2022, and were invited to give oral presentations, and the relevant papers and codes have been open sourced
    .

    A common paradigm for single-cell multimodal data integration is to project data from different feature spaces into low-dimensional space through encoders unique to each modality, and then integrate modal-specific low-dimensional representations by learning alignment methods using pairwise supervised signals from multiomics techniques
    .
    But these methods have a common limitation, they do not take into account that the resolution between different modalities is different, such as immune cells have a more detailed characterization of the surface protein modality, but the difference in overall gene expression is relatively small
    .
    Therefore, during integration, the low-resolution gene expression space affects the high-resolution protein space, thereby losing information about the specificity of these modalities
    .
    In other words, these different modalities will hinder each other, rather than promote
    together.

    In order to solve this problem, CLUE introduces modal-specific representation subspaces, and has a corresponding subspace for each mode to learn the information of the corresponding modals, thereby eliminating the mutual limitations
    caused by different resolutions between different modes.
    At the same time, CLUE further uses self-encoders for different modalities to learn the original information in a single modality, and uses cross-encoder to learn the information between different modalities, and then integrates these representations from different modalities through mapping between multimodalities (Figure 2).

    Figure 2 Schematic diagram of the CLUE model framework

    In addition, CLUE also introduces Adversarial learning to eliminate representation differences between different modalities, and optimizes the mean square error of paired multimodal representations with the help of supervised signals from multiomics, thereby further improving the accuracy of
    integration.

    In the first NeurIPS Multimodal Single Cell Data Integration Competition, CLUE won the first place in cross-modal integration in all integration categories, including single-cell chromatin open group/transcriptome/surface proteome (Figure 3)10
    .
    At the same time, CLUE also achieved the best performance
    in the comparison of integration methods such as MultiVI, Cobolt, and Bridge-integration that are not yet in the competition.

    The relevant model of CLUE in single-cell multiomics has been integrated into the Python-based open source software package GLUE (https://github.
    com/gao-lab/GLUE)11 previously developed by Gao Ge's group
    .
    It is worth noting that the design of CLUE is not limited to single-cell multiomics data, and in principle can be extended to various modal fields
    such as image/text/audio.

    Figure 3 Results of CLUE integration on single-cell chromatin open group, transcriptome, surface proteome

    Tu Xinming, an undergraduate student at Peking University's School of Life Sciences (currently a doctoral student at the University of Washington), Dr.
    Cao Zhijie, a postdoctoral fellow at Peking University, is the co-first author of the paper, Xia Chenrui, a graduate student of Peking University, is the corresponding author of this paper, and Tu Xinming's current supervisor, Professor Sara Mostafavi of the University of Washington, is the co-corresponding author
    of the paper.
    The research was supported
    by the National Key Research and Development Program, the State Key Laboratory of Protein and Plant Gene Research, the Beijing Future Genetic Diagnosis Advanced Innovation Center and Changping Laboratory.

    Open source code: https://github.
    com/gao-lab/GLUE

    Full text: https://openreview.
    net/pdf?id="Tfb73TeKnJ-

    1.
    Tu, X*.
    , Zhijie-Cao*, Xia, C.
    , Mostafavi, S.
    & Gao, G.
    Cross-Linked Unified Embedding for cross-modality representation learning.
    in 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

    2.
    Stuart, T.
    , Butler, A.
    , Hoffman, P.
    , Hafemeister, C.
    , Papalexi, E.
    , Mauck, W.
    M.
    , Hao, Y.
    , Stoeckius, M.
    , Smibert, P.
    & Satija, R.
    Comprehensive Integration of Single-Cell Data.
    Cell177, (2019).

    3.
    Ma, S.
    , Zhang, B.
    , LaFave, L.
    M.
    , Earl, A.
    S.
    , Chiang, Z.
    , Hu, Y.
    , Ding, J.
    , Brack, A.
    , Kartha, V.
    K.
    , Tay, T.
    , Law, T.
    , Lareau, C.
    , Hsu, Y.
    -C.
    , Regev, A.
    & Buenrostro, J.
    D.
    Chromatin Potential Identified by Shared Single-Cell Profiling of RNA and Chromatin.
    Cell183, 1103-1116.
    e20 (2020).

    4.
    Cao, J.
    , Cusanovich, D.
    A.
    , Ramani, V.
    , Aghamirzaie, D.
    , Pliner, H.
    A.
    , Hill, A.
    J.
    , Daza, R.
    M.
    , McFaline-Figueroa, J.
    L.
    , Packer, J.
    S.
    , Christiansen, L.
    , Steemers, F.
    J.
    , Adey, A.
    C.
    , Trapnell, C.
    & Shendure, J.
    Joint profiling of chromatin accessibility and gene expression in thousands of single cells.
    Science361, 1380 1385 (2018).

    5.
    Chung, H.
    , Parkhurst, C.
    N.
    , Magee, E.
    M.
    , Phillips, D.
    , Habibi, E.
    , Chen, F.
    , Yeung, B.
    Z.
    , Waldman, J.
    , Artis, D.
    & Regev, A.
    Joint single-cell measurements of nuclear proteins and RNA in vivo.
    Nat Methods18, 1204–1212 (2021).

    6.
    Janssens, J.
    , Aibar, S.
    , Taskiran, I.
    I.
    , Ismail, J.
    N.
    , Gomez, A.
    E.
    , Aughey, G.
    , Spanier, K.
    I.
    , Rop, F.
    V.
    D.
    , González-Blas, C.
    B.
    , Dionne, M.
    , Grimes, K.
    , Quan, X.
    J.
    , Papasokrati, D.
    , Hulselmans, G.
    , Makhzami, S.
    , Waegeneer, M.
    D.
    , Christiaens, V.
    , Southall, T.
    & Aerts, S.
    Decoding gene regulation in the fly brain.
    Nature 1–7 (2022).
    doi:10.
    1038/s41586-021-04262-z

    7.
    Argelaguet, R.
    , Clark, S.
    J.
    , Mohammed, H.
    , Stapel, L.
    C.
    , Krueger, C.
    , Kapourani, C.
    -A.
    , Imaz-Rosshandler, I.
    , Lohoff, T.
    , Xiang, Y.
    , Hanna, C.
    W.
    , Smallwood, S.
    , Ibarra-Soria, X.
    , Buettner, F.
    , Sanguinetti, G.
    , Xie, W.
    , Krueger, F.
    , G?ttgens, B.
    , Rugg-Gunn, P.
    J.
    , Kelsey, G.
    , Dean, W.
    , Nichols, J.
    , Stegle, O.
    , Marioni, J.
    C.
    & Reik, W.
    Multi-omics profiling of mouse gastrulation at single-cell resolution.
    Nature576, 487–491 (2019).

    8.
    Welch, J.
    D.
    , Kozareva, V.
    , Ferreira, A.
    , Vanderburg, C.
    , Martin, C.
    & Macosko, E.
    Z.
    Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity.
    Cell177, (2019).

    9.
    Argelaguet, R.
    , Cuomo, A.
    S.
    E.
    , Stegle, O.
    & Marioni, J.
    C.
    Computational principles and challenges in single-cell data integration.
    Nat Biotechnol39, 1202–1215 (2021).

    10.
    Lance, C.
    , Luecken, M.
    D.
    , Burkhardt, D.
    B.
    , Cannoodt, R.
    , Rautenstrauch, P.
    , Laddach, A.
    , Ubingazhibov, A.
    , Cao, Z.
    -J.
    , Deng, K.
    , Khan, S.
    , Liu, Q.
    , Russkikh, N.
    , Ryazantsev, G.
    , Ohler, U.
    , participants, N.
    2021 M.
    data integration competition, Pisco, A.
    O.
    , Bloom, J.
    , Krishnaswamy, S.
    & Theis, F.
    J.
    Multimodal single cell data integration challenge: results and lessons learned.
    Biorxiv 2022.
    04.
    11.
    487796 (2022).
    doi:10.
    1101/2022.
    04.
    11.
    487796

    11.
    Cao, Z.
    -J.
    & Gao, G.
    Multi-omics single-cell data integration and regulatory inference with graph-linked embedding.
    Nat Biotechnol 1–9 (2022).
    doi:10.
    1038/s41587-022-01284-4

    This article is an English version of an article which is originally in the Chinese language on echemi.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to service@echemi.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

    Contact Us

    The source of this page with content of products and services is from Internet, which doesn't represent ECHEMI's opinion. If you have any queries, please write to service@echemi.com. It will be replied within 5 days.

    Moreover, if you find any instances of plagiarism from the page, please send email to service@echemi.com with relevant evidence.