Transcriptome Analysis Identifies GATA3-AS1 as a Long Noncoding RNA Associated with Resistance to Neoadjuvant Chemotherapy in Locally Advanced Breast Cancer Patients

Transcriptome Analysis Identifies GATA3-AS1 as a Long Noncoding RNA Associated with Resistance to Neoadjuvant Chemotherapy

Breast cancer (BC) is one of the main causes of death in women worldwide, with >600,000 deaths annually (https:// gco.iarc.fr/today/home,last accessed February 25, 2021).Furthermore, as the leading cause of cancer in women, it constitutes a public health burden. 1 In particular, patients with locally advanced breast cancer (LABC) represent a heterogeneous group with variable local recurrence and global survival.2e4 These patients have a significant risk for local recurrence and metastatic progression, in addition to presenting low rates of global survival compared with patients with BC at early stages (stage I to IIA).Hence, a comprehensive approach to LABC patients to achieve local and distant control of disease has become a challenge, as well as monitoring disease progression and treatment efficacy. 5n the other hand, pathologic complete response (pCR) is one of the most important parameters to consider in patient prognosis.However, different studies have shown that <50% of patients with LABC achieve pCR after neoadjuvant chemotherapy (NAC). 6NAC was initially used in the context of LABC because it has several advantages, such as making inoperable tumors surgically resectable (stage T4, N2, or N3) and increasing the rates of breastconserving surgery. 6Although some biomarkers have proved useful for improving treatment efficiency, most are still in the clinical testing stage and have not yet been approved for standardized use.7e9 Expression status of estrogen receptor (ER), progesterone receptor, and ERBB2 [human epidermal growth factor receptor 2 (HER2)] by immunohistochemical evaluation is currently the gold standard for determining management and response to treatment. 10In addition, gene expression panels, such as Oncotype Dx and Mammaprint, are examples of sets of biomarkers used to provide further clinical support by predicting response to chemotherapy. 11Nevertheless, these types of tests have mostly been restricted to adjuvant chemotherapy, whereas only a few biomarkers, such as Ki-67, 12e14 have been proposed for response to NAC.
Recently, in addition to gene expression of coding genes, it has been proposed that noncoding transcripts, such as long noncoding RNAs (lncRNAs), may also serve as molecular markers for BC diagnosis and prognosis.15e17 These transcripts are defined as having >200 bases and lacking open reading frames, making them unable to be translated into proteins. 18herefore, transcriptome analysis by ab initio assembly established the potential importance of lncRNAs in cancer, suggesting that this kind of noncoding transcript could be useful in cancer pathogenesis and biomarker development. 19he lncRNA HOTAIR has been proposed as a potential prognostic biomarker, and its overexpression was associated with metastasis-free survival and overall survival (OS), suggesting that this noncoding transcript is a powerful predictor of metastasis and death. 16Another example is the ER-regulated lncRNA DSCAM-AS1, which is overexpressed in ER-positive tumors and was shown to be of clinical relevance as a good predictor of tumor progression and tamoxifen resistance. 20In another comprehensive analysis of RNA sequencing (RNA-Seq) data from The Cancer Genome Atlas, Berger et al 15 constructed gene correlation networks and detected significant gene-lncRNA interactions in breast cancer between coding genes (ESR1 and DKC1) and lncRNAs (NEAT1, TUG1, and TERC ).Despite their potential, to date, few studies have investigated or reported lncRNAs as biomarkers of response to systemic therapies in BC 21 and specifically to NAC in BC. 22e24 Therefore, focusing on lncRNAs may aid in identifying novel and more accurate biomarkers for predicting the response to systemic neoadjuvant therapy in BC.
In this study, using RNA-Seq profiling and machine learning, first, a group of lncRNAs was identified; these lncRNAs are differentially expressed in NAC-resistant LABC Mexican patients compared with NAC-sensitive patients (nonresponders and responders, respectively) and were also predictive and stable features in random forest model.In particular, GATA3-AS1 was identified as a divergent lncRNA that acts as a predictive biomarker of the response to NAC.Expression profiling by RT-qPCR on a larger cohort confirmed that GATA3-AS1 is overexpressed only in nonresponder patients (n Z 68).In addition, univariate and multivariate analyses established that GATA3-AS1 distinguishes between nonresponder and responder patients with a sensitivity of 92.9%, a specificity of 75.0%, and an area under the curve (AUC) of approximately 0.90.Finally, GATA3-AS1 is suggested as a novel biomarker for predicting NAC response in patients with LABC and provides the first evidence of this lncRNA as a prediction biomarker for NAC response in breast cancer patients with positive hormonal receptors that correspond to luminal Blike HER2-positive and HER2-negative phenotypes.

Breast Sample Collection
Eleven RNA samples, obtained from biopsies of female patients diagnosed with locally advanced mammary adenocarcinoma (stage IIB to IIIC) belonging to the Mexican National Cancer Institute population who were candidates for the administration of neoadjuvant chemotherapy, were sequenced by RNA-Seq.A validation cohort was collected from another 68 biopsies diagnosed with primary breast tumors, all of whom were patients at the Breast Tumor Division in the Mexican National Cancer Institute between January 2012 and December 2015.Samples were collected from tissue of the initial biopsy (taken with a thick needle) before treatment began (Figure 1).All patients were previously confirmed with primary breast carcinoma and locally advanced disease (clinical stage IIA to IIIC) by histologic studies.Patient selection was performed with the support of the breast tumor department at the Mexican National Cancer Institute, which includes oncologists and pathologists.

lncRNA Predicts Chemotherapy Resistance
The Journal of Molecular Diagnostics -jmdjournal.orgAll patients included in this study received systemic NAC according to the recommendations of the National Comprehensive Cancer Network guidelines.Chemotherapy regimens were based on anthracyclines and taxanes in sequential scheduling, as described below: four taxane cycles every 21 days (paclitaxel, 80 mg/m 2 , on days 1, 8, and 14; or docetaxel, 100 mg/m 2 , on day 1) followed by four doses of fluorouracil-Adriamycin-cyclophosphamide every 21 days (fluorouracil, 500 mg/m 2 , Adriamycin, 50 mg/m 2 , and cyclophosphamide, 500 mg/m 2 , on day 1).Subsequently, all patients underwent local control by mastectomy, breast conservation surgery mastectomy, or breast conservation surgery.At the end of the neoadjuvant regimen, treatment response was evaluated in surgical specimens obtained by the oncopathologist.A pCR was classified according to the most accepted definition: the absence of infiltrating components in the breast and lymph nodes (ypT0/is or ypN0).Luminal B-like phenotype was assigned according to the expression of estrogen and progesterone receptors (>1%), in absence or presence of HER2 overexpression.If HER2 was not overexpressed, it was considered for subtyping the expression of Ki-67 (!20%). 25In this study, patients were classified as responders if they presented pCR and nonresponders if they did not show pCR.Informed consent was obtained, and this study was approved by the ethical and research committee of the Mexican National Cancer Institute (018/ 055/DII CEI/1302/18).

External RNA-Seq Data Collection of Breast Cancer Cell Lines
RNA-Seq results of the breast cancer cell lines were obtained from the data set contained in the Breast Cancer Profiling Project, Gene Expression 1: baseline mRNA sequencing on 35 cell lines, which forms part of the Library of Integrated Network-Based Cellular Signatures, which includes 33 breast cancer cell lines and 2 transformed noncancerous breast cell lines (http://lincs.hms.harvard.edu/db/datasets, last accessed May 14, 2020).

RNA-Seq Data Analysis
The quality of the sequencing files was determined with reports generated using FastQC 26,27 version 0.11.9.Filtering of low-quality reads and adapter removal were performed with trimmomatic version 0.39. 28Reads were mapped to the human reference genome assembly version hg38 using The study was divided in two phases, discovery and validation phases, with snap-frozen pretreatment core needle biopsies obtained from primary breast cancer patients who responded to chemotherapy and from those who did not respond to treatment (responder and nonresponder patients, respectively).B: From them, 11 patients participated in the discovery phase, and the RNA from their samples was taken to construct a poly-A library for a paired-end RNA sequencing (Materials and Methods).After bioinformatic analysis of sequencing data by implementing random forest algorithm of machine learning approach, the differentially expressed genes, especially overexpressed long noncoding RNAs (lncRNAs), were identified and selected for potential prediction biomarkers.C: Among them, lncRNAs were identified and selected for validation phase by RT-qPCR analysis and were also validated in public databases TANRIC (https://www.tanric.org,last accessed February 25, 2021) and The Cancer Genome Atlas (https://portal.gdc.cancer.gov,last accessed February 25, 2021).Image was generated with BioRender.com.LABC, locally advanced breast cancer; NAC, neoadjuvant chemotherapy; RNA-Seq, RNA sequencing.STAR aligner version 2.7.1a. 29 Gene expression quantification was performed with featureCounts from the rsubread package version 1.34.7 on the STAR bam files.Gene expression quantification was also performed by aligning to the transcriptome with Salmon version 0.14.1. 30All gene quantification was performed on gencode version 31 annotations.Tumor purity was inferred using the ESTIMATE algorithm 31 version 2.0.Differential expression analysis was performed with DEseq2 32 version 1.24 and was subsequently filtered to analyze only the subset of lncRNAs.The cutoff points were a log2 fold change (FC) value of 1.5 for overexpressed lncRNAs and À1.5 for underexpressed lncRNAs (false discovery rate < 0.05).To corroborate grouping of the patient tumor samples, principal component analysis was performed on the basis of the expression profile of the whole transcriptome, where a batch effect for ER status was detected and added as a covariate in DEseq2 design (wER_status þ response).
Random forest models were trained on the log-transformed normalized counts (rlog function with blind Z FALSE parameter) to predict patient response, using the package randomForest 33 version 4.6.14 in a leave-two-out crossvalidation scheme, where two samples (one responsive and one nonresponsive) were always left out.The mean decrease in accuracy was stored for each gene and each fold.To assess the stability of the feature importance, the quartile coefficient of dispersion of the mean decrease in accuracy over all folds was computed for each gene: , where Q 3 and Q 1 are the third and first quartiles, respectively, of their mean decrease in accuracy.
ConsensusPathDB 34 was performed using clusterProfiler. 35he Bioconductor package on all genes identified differential expression (DE) in at least one of the two quantification methods using all expressed genes as background.Gene set enrichment analysis 36 was performed on all expressed genes ordered by their DE elog10 P value from DEseq2.Pathway enrichment of lncRNAs was performed using the LncPath (LncRNAs2Pathways 37 ) package version 1.1.

RNA Isolation and Quantitative Real-Time PCR Assays
MCF-10A, MCF-7, BT474, and MDA-MB-231 cell cultures were performed following respective ATCC (Manassas, VA) culture protocols.Total RNA was isolated from cultured cells using TRIzol (Thermo Fisher Scientific, Waltham, MA), according to the manufacturer's instructions.For patient samples, RNA was isolated using the AllPrep kit (Qiagen, Germantown, MD; number 80204); RNA concentration and quality analysis (RNA integrity number value) were performed by a Tape Station 2200 bioanalyzer (Agilent Technologies, Santa Clara, CA).Then, 1 mg RNA was treated with DNase I, RNase free (molecular biology; Thermo Fisher Scientific; reference EN0525).cDNA was synthesized from 1 mg total RNA using a High Capacity cDNA Reverse Transcription (Applied Biosystems, Thermo Fisher Scientific; reference 4368814).
Finally, real-time PCR was performed with SYBR Green/ ROX qPCR Master Mix (molecular biology; Thermo Fisher Scientific; reference K0223) on a QuantStudio 3 Real-Time PCR System (Applied Biosystems, Thermo Fisher Scientific).Relative gene expression was determined by fold change calculation (DDCt) for cell lines and DCt for patient samples, normalized to RPS28 as a housekeeping gene.The primers used are listed in Table 1.

RNA-Seq
To ensure good quality of the samples for sequencing, Nanodrop, Qubit 2.0 (Life Technologies, Carlsbad, CA), and the Agilent 2100 Bioanalyzer (Agilent Technologies) were used to detect the purity, concentration, and integrity of RNA samples, respectively.All samples had RNA integrity numbers >8.0.A total of 1 mg RNA was used to generate sequencing libraries using the TruSeq Stranded mRNA library prep kit from Illumina, Inc. (San Diego, CA), according to the manufacturer's instructions.After construction of the libraries, their concentrations and insert sizes (approximately 260 bp) were detected using Qubit 2.0 and the Agilent 2100 Bioanalyzer, respectively.The library was then sequenced using an Illumina HiSeq 2500 sequencer with paired-end 2 Â 125 cycles using Illumina TruSeq version 4 sequencing by synthesis (SBS) chemistry and following the manufacturer's instructions.The depth of sequencing was >25 million reads using a HiSeq2500 sequencer by Illumina.RNA-Seq data are available from National Center for Biotechnology Information Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo,accession number GSE159448).

Immunohistochemistry
Primary antibody for GATA3 protein was obtained from BioSB (mouse monoclonal; clone EP368; catalog number BSB 3333; BioSB, Santa Barbara, CA).The reaction was performed on a Benchmark Ultra automated immunohistochemistry device (Ventana Medical Systems, Tucson, AZ).Briefly, the slides were deparaffinized at 72 C for 4 minutes on xylene-free dewaxing reagent (EZprep; Ventana Medical Systems).Antigen retrieval followed, for 8 minutes at 95 C in alkaline solution (CC1; Ventana Medical Systems).Primary antibody was incubated for 16 minutes at 36 C, diluted 1:50 (DaVinci Green diluent; catalog number PD900; Biocare Medical, Pacheco, CA).Finally, enhanced polymer-based detection with diaminobenzidine chromogen was employed (Optiview DAB Kit; Ventana Medical Systems).In between all steps, thorough washings were made (Reaction Buffer; Ventana Medical Systems).The slides were then manually counterstained with Harris hematoxylin and mounted with nonaqueous medium.

Statistical Analysis
Descriptive statistics of the main demographic and clinical variables were performed, presenting the median and lncRNA Predicts Chemotherapy Resistance The Journal of Molecular Diagnostics -jmdjournal.orginterquartile range of continuous variables and the proportion of qualitative variables.Differences in the variables collected in the study groups (response versus no response) were identified according to the type of variable with the Utest or the c 2 test.Identification of the effect of overexpression or down-regulation of GATA3-AS1 on clinical response was adjusted to the main clinical variables and was performed by statistical analysis of 66% with the sample available.A P < 0.05 was considered statistically significant.Analysis of variance, followed by the Tukey test, was performed to determine significant differences in GATA3-AS1 expression among different cell lines, in RNA-Seq results analysis, and in quantitative real-time PCR relative expression analysis.For clinical data, GATA3-AS1 expression differences between responder and nonresponder groups were determined by t-test (two tailed, nonpaired, 95% confidence value), assuming variance homogeneity (determined by the Levene test, P Z 0.06).This was verified by the Fisher test and c 2 test (P < 0.05).In addition, because of sample size, normal distribution was determined by Lilliefors, Kolmogorov-Smirnov, and Shapiro-Wilk tests, indicating that the sample did not show normal distribution (P > 0.05).For that reason, a nonparametric U-test/Wilcoxon test was implemented, showing differences between medians (P < 0.05).
For The Atlas of Noncoding RNAs in Cancer (TANRIC, https://www.tanric.org,last accessed February 25, 2021) survival plot analysis, Cox regression was implemented to determine survival time, and the log-rank test was used to compare survival distribution.Significance was defined as P < 0.05.To determine the specificity and sensitivity of GATA3-AS1 overexpression in neoadjuvant resistance prediction, receiver operating characteristic curve analysis was performed.For this analysis, the STATA software version 14 (StataCorp, College Station, TX) was used.

Clinicopathologic Characteristics of Locally Advanced Breast Cancer Patients
This study was primarily focused on identifying biomarkers to predict the response to NAC in patients with LABC within luminal B-like phenotype.The main features that define each patient are the molecular subtype, age, clinical stage, and response to treatment, among others (Table 2).Expression of hormonal receptors corresponds to histologic classification of the molecular luminal B-like HER2-positive and HER2negative phenotype (Materials and Methods).

Transcriptome Profiling of LncRNAs in Breast Cancer Patients Is Associated with Nonresponder Patients
With the objective of detecting lncRNAs that could potentially serve as molecular biomarkers of predicted response to NAC, transcriptome profiling was performed in a poly Aeenriched population of RNAs using RNA-Seq.The discovery phase included snap-frozen pretreatment core needle biopsies from primary breast cancer patients who responded to NAC (n Z 4) and from those who did not respond to the system treatment (n Z 7), referred to as responder and nonresponder patients, respectively.Principal component analysis considering the complete expression profile, including mRNAs and lncRNAs, showed no distinction between groups; however, grouping based on ER status was observed and adjusted for these in subsequent analyses (Supplemental Figure S1).Next, DE analysis was performed comparing nonresponder patients with responder patients and identified 69 lncRNAs that were underexpressed and 10 lncRNAs that were overexpressed, according to the established cutoff point (jlog2FCj > 1.5 and false discovery rate 0.05) (Figure 2A).Unsupervised hierarchical clustering of patients shows that these DE lncRNAs place responders and nonresponders into two well-defined groups (Figure 2B).
Given that most of the lncRNAs identified in this study are synthesized from the antisense strand with limited information about their biological functions or molecular characterization, to investigate their association with different cell types, FARNA server was used (https://www.cbrc.kaust.edu.sa/farna, last accessed February 25, 2021), which infers the function of noncoding RNAs based on the function of their coexpressed genes in multiple data sets (Supplemental Table S1).Results demonstrated that most lncRNAs overexpressed in the nonresponder group participate in processes, such as chromatin remodeling, miRNA interactions, and cancer progression.In contrast, most underexpressed lncRNAs were related to apoptosis and interacting competing endogenous RNA networks.Derived from the fact that most lncRNAs found in this study had not been previously reported or functionally characterized, the study was focused on GATA3-AS1 (FC, 3.02), first because this lncRNA is the only lncRNA that was overexpressed in all nonresponder Bases, n Amplicon, bp Symbol  2B), and second, it was the only lncRNA that had been previously well characterized in lymphocytes. 38In addition, this lncRNA is near GATA3, an important gene in breast cancer. 39Interestingly, in RNA-Seq differential expression analyses between responder and nonresponder patients, GATA3 is not included in the significantly overexpressed genes in nonresponder patients (Supplemental Table S2).
Overexpressed and Underexpressed LncRNAs, as well as Machine Learning Analysis, Define Patients Who Are Nonresponders to NAC RNA sequencing is a robust tool for measuring all transcripts, especially when used for identifying differentially expressed genes or noncoding RNA genes, such as lncRNAs, between sample groups. 40Depending on the pipelines used in RNA-Seq analysis, an incorrect estimate of transcript abundance can be obtained, indicating that differences between pipelines contribute to overall uncertainty in estimates of transcript abundance. 41Given this premise, the decision to use two different pipelines to analyze RNA sequencing data was taken.Salmon þ DESeq2 and STAR þ FeaturesCounts þ DESeq2 were used to identify differential expression of lncRNAs.With the Salmon pipeline, 70 underexpressed lncRNAs were obtained compared with 64 lncRNAs with the STAR pipeline, and 40 lncRNAs appeared in both methods (Figure 3A and Supplemental Table S3).For overexpressed lncRNAs, when the Salmon pipeline was applied, 10 lncRNAs were found compared with 6 when the STAR pipeline was used, with an overlap of 4 lncRNAs between the two methods (Figure 3B and Supplemental Table S4).Interestingly, one of the overexpressed lncRNAs that coincided in the two pipelines was GATA3-AS1, the same lncRNA overexpressed in all nonresponder patients.Furthermore, to identify lncRNAs associated with response to NAC, random forest models were built on subsets of patient samples by leaving two patients out (Material and Methods), and the mean decrease in accuracy of each variable was recorded for each model (Figure 3C).The top 50 genes sorted by median mean decrease in accuracy were selected, where only 6 of these were also found to be differentially expressed.This highlights the importance of considering nonlinear predictive models for biomarker discovery in complex and heterogeneous diseases, such as cancer.To assess the stability of the feature importance in the random forest models, the quartile coefficient of dispersion was calculated (Figure 3C).It was observed that GATA3-AS1 exhibited the most stable importance across the different models.Given its high importance scores and stability across the random forest models and its detection as DE, GATA3-AS1 was selected as a top candidate for further validation.4A).Even in the expression analysis from RNA-Seq data quantified by transcript per million, GATA3-AS1 was overexpressed only in nonresponder patients (Figure 4B), suggesting its importance as a prediction biomarker of response to NAC in LABC patients with a luminal B-like phenotype.
In addition, to evaluate oncogenic potential and tissue specificity of GATA3-AS1, expression validation in breast cancer cell lines was needed.RNA-Seq data from the Library of Integrated Network-Based Cellular Signatures (http://lincs.hms.harvard.edu/db/datasets,last accessed May 14, 2020) were used; this library includes 33 breast cancer cell lines and 2 transformed noncancerous breast cell lines (see Materials and Methods).RNA-Seq histograms established that several breast cancer cell lines overexpressing GATA3-AS1 and MCF10A, a nontumorigenic epithelial cell line, exhibited basal expression levels of GATA3-AS1 (Figure 4C).Interestingly, in bar plots where GATA3-AS1 expression levels were measured by reads per kilobase of transcript per million reads mapped (RPKM) (Figure 4D), neoplastic cell lines, such as MDA-MB-157, MDA-MB-436, HCC1395, and CAL51, among others, showed basal expression levels, such as the noncancerous breast cell lines MCF10A and HME1.However, in the other 27 breast cancer cell lines analyzed, GATA3-AS1 was overexpressed; in particular, MCF7 and T47D breast cancer cell lines showed the highest expression levels of this lncRNA, suggesting that GATA3-AS1 is a tissueand stage-specific overexpressed lncRNA.Regarding GATA3, the adjacent coding gene, its expression was similar between responder and nonresponder patients (Supplemental Figure S2,  A and B).Also, GATA3 was widely expressed among the analyzed breast cancer cell lines (Supplemental Figure S2, C  and D).Similarly, GATA3 expression was widespread in the different cancer types analyzed (Supplemental Figure S3).On the contrary, analysis of RNA-Seq data from MiTranscriptome (http://mitranscriptome.org, last accessed February 25, 2021) demonstrated that GATA3-AS1 is only overexpressed in breast and bladder cancer tissues, establishing a highly cancer-specific expression pattern and highlighting its importance in breast cancer (Supplemental Figure S4).In addition, this lncRNA is overexpressed in positive hormonal receptor phenotypes from RNA-Seq data obtained from TANRIC (Supplemental Figure S5), suggesting a cancer-specific expression pattern in positive hormone receptor phenotypes.Interestingly, GATA3 was overexpressed in hormone-positive subtypes (Supplemental Figure S6) in a similar manner to the expression pattern seen for GATA3-AS1.

Higher Relative Expression Levels of GATA3-AS1 Are Associated with Response to NAC
Once overexpression of GATA3-AS1 was determined in breast cancer cell lines and nonresponder patients through RNA-Seq analysis, expression levels of GATA3-AS1 were examined by RT-qPCR in human breast cancer cell lines and in samples from an independent cohort of patients.For breast cancer cell lines, results showed that the MCF-7 cell line exhibited increased expression levels of GATA3-AS1 by >300-fold, followed by 100-fold overexpression in the MDA-MB-231 and BT474 cell lines, which presented 40fold lower expression values of GATA3-AS1 than the normal MCF10A cell line (Figure 5A).Hence, results revealed that all breast cancer cell lines analyzed in this study displayed higher expression of GATA3-AS1 when their levels of expression were compared with nonneoplastic cell lines, such as MFC-10A.
Once expression levels of GATA3-AS1 in human breast cancer cell lines were validated, its expression levels were analyzed by RT-qPCR in a validation cohort of 68 patients that included nonresponders and responders to NAC treatment within the luminal B-like phenotype.It was found that nonresponder patients overexpressed GATA3-AS1, whereas responder patients exhibited underexpression of GATA3-AS1 (Figure 5B).Subsequently, specificity and sensitivity evaluations were performed, constructing an adjusted receiver operating characteristic curve.Figure 5C shows receiver operating characteristic curve analysis to evaluate the predictive capacity of GATA3-AS1 expression by RT-qPCR between nonresponder and responder patients.The analysis demonstrates that GATA3-AS1 has a sensitivity of 92.9% and a specificity of 75% with an AUC of approximately 0.90, which indicates that the use of GATA3-AS1 in the prediction of response to NAC distinguishes between patients who will not respond to NAC from those who will show response to system treatment (nonresponders versus responders, respectively).To determine whether there is a relationship between GATA3-AS1 expression and tumor purity, RNA-Seq data analysis was used.As shown in Figure 5D, there was no relation between GATA3-AS1 and tumor purity, suggesting that overexpression of GATA3-AS1 and tumor purity is not associated with NAC response.
In addition, it was determined that GATA3-AS1 and its adjacent coding gene GATA3 were co-expressed in the cohort lncRNA Predicts Chemotherapy Resistance The Journal of Molecular Diagnostics -jmdjournal.org of Mexican patients (Spearman Z 0.63) (Supplemental Figure S7A) as well as in the public database TANRIC (Spearman Z 0.80) (Supplemental Figure S7B).Interestingly, GATA3 showed overexpression in breast cancer cell lines (Supplemental Figure S8A), but this coding gene was not significantly overexpressed in nonresponder patients compared with responder ones (Supplemental Figure S8B).Furthermore, GATA3 showed low sensitivity of 52.4%, with a specificity of 73.9% and an AUC of 0.60 (Supplemental Figure S8C).Besides, no relationship was found between GATA3 expression and tumor purity (Supplemental Figure S8D).Finally, GATA3 protein expression by immunohistochemistry was not shown to distinguish responders from nonresponders (Supplemental Figure S9), suggesting that in LABC within the luminal B-like phenotypes GATA3 is not a good molecular marker for predicting the response to NAC contrary to GATA3-AS1.Taken together, these results suggest that GATA3-AS1 is overexpressed only in nonresponder patients, indicating that GATA3-AS1 represents a potential biomarker of NAC resistance in patients with LABC within the luminal B-like phenotypes who do not respond to systemic treatment.

GATA3-AS1 Is a Predictive Molecular Biomarker of Response to NAC in Breast Cancer Patients
To determine the role of GATA3-AS1 in prognosis, a Kaplan-Meier curve was generated.No relationship was detected for GATA3-AS1 overexpression in OS when the cohort of 68 Mexican patients was used, suggesting that this lncRNA is not a prognostic factor in LABC patients within the luminal B-like phenotype (Supplemental Figure S10).Even when the relationship of GATA3-AS1 with OS from TANRIC RNA-Seq luminal B phenotypes (Supplemental Figure S11A) and in all molecular subtypes (Supplemental Figure S11B) was validated, no relationship with OS was observed, confirming that GATA3-AS1 is not related to prognosis.Therefore, these results suggest that GATA3-AS1 is not a prognostic biomarker in the cohorts analyzed.Furthermore, multivariate logistic regression showed that GATA3-AS1 overexpression was an independent predictor of response adjusted by menopausal status and phenotype, proving to be statistically significant in this model with 37.49-fold (95% CI, 6.74e208.42)more probability in nonresponders with GATA3-AS1 compared with responder patients who did not express GATA3-AS1 (Table 3).Finally, from all analyses and results obtained in this study, GATA3-AS1 is proposed as a novel divergent lncRNA that may serve as a potential predictive molecular biomarker of response to NAC, which could be included in clinical practice to manage Mexican LABC patients with a luminal B-like phenotype.

Functional Analysis of Long Noncoding RNAs in Nonresponder Patients to NAC
To identify potential affected pathways in nonresponder patients, lncRNA Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis was performed with LncRNAs2Pathways using the top 79 differentially expressed lncRNAs between nonresponder and responder groups from RNA-Seq data analyzed.It was found that the term taste transduction was significantly enriched for underexpressed lncRNAs (P < 0.05), whereas olfactory transduction and renin-angiotensin system were significantly enriched for overexpressed lncRNAs (P < 0.05) (Figure 6) and found that the most significantly enriched terms were taste transduction, Parkinson disease, Alzheimer disease, oxidative phosphorylation, and regulation of autophagy (Figure 6).These results indicate that lncRNAs may influence these pathways and biological processes associated with NAC response; however, more analysis is needed to confirm this.Pathway enrichment analyses were performed using clus-terProfiler on a collection of pathways from multiple sources and were collected by ConsensusPathDB 34 to determine functional enrichment analysis of the differentially expressed mRNAs in RNA-Seq data.The top 20 functionally enriched biological processes obtained from cluster-Profiler analysis under Gene Ontology terms are indicated in a bar chart of the 214 differentially expressed mRNAs.The most significantly enriched biological processes were mesodermal commitment pathway, transmission across chemical synapsis, and neuronal system (P < 0.05) (Figure 7A).The interaction networks between enriched biological processes were analyzed, yielding an interaction network among the biological processes related to mesodermal commitment pathway and cell differentiation, as well as neuronal system and transmission across chemical synapsis, among others (Figure 7B).To identify genes involved in each functionally enriched biological process, a heat map of enriched mRNAs was constructed (Figure 7C).Finally, the network of gene pathways with significantly altered expression (FC > 1.5 and P < 0.05) was delineated using clusterProfiler (Figure 7D).These results suggest that mRNAs may influence these pathways and biological processes associated with NAC response.

Discussion
In general, lncRNAs are differentially expressed among human tissues. 19These expression profiles are related to the different roles of lncRNAs in the cellular physiology of tissues, and alterations in the transcriptional rate of lncRNAs could lead to the development of pathologies, such as cancer. 42Transcriptomic analysis of lncRNAs in different human neoplastic tissues demonstrated that these transcripts are differentially expressed among human cancers. 43In particular, RNA-Seq and microarray studies have shown that breast cancer is characterized by specific lncRNA expression profiles among molecular subtypes lncRNA Predicts Chemotherapy Resistance The Journal of Molecular Diagnostics -jmdjournal.org(luminal, HER2 enrichment, and basal) 15 that have been related to prognostic variables, such as OS, recurrence, progression, metastasis risk, treatment efficacy, and resistance to treatment. 44,45Because of their association with treatment resistance, several lncRNAs have been identified for potential predictive use in endocrine therapy, such as HOTAIR, 46 lncRNA-ATB for antibody administration, 47 and LINC00472 for adjuvant chemotherapy. 48However, little is known about the association of lncRNAs and response to neoadjuvant chemotherapy, which is the standard treatment for breast cancer patients with locally advanced disease. 49,50everal studies based on microarray assays have shown that lncRNAs are related to pathologic complete response in breast cancer patients in all molecular subtypes. 24,51urrently, a few studies on transcriptome analysis by RNA-Seq in NAC resistance have been performed 52e54 to better understand the molecular biology of resistance to NAC in LABC patients.
In this study, a subset of long noncoding genes differentially expressed in LABC within luminal-like B phenotype patients who did not respond to NAC treatment were identified.From this subset, only the most characterized lncRNAs were selected, and identified the divergent lncRNA GATA3-AS1, which had been previously reported to be overexpressed in breast cancer patients. 51In addition, GATA3-AS1, RP11-279F6, and AC017048 showed specific and high expression levels in ER-positive (ER þ ) compared with ER-negative (ER e ) cancers and normal breast tissue samples. 51GATA3-AS1 belongs to a class of divergent lncRNAs; it is located on chromosome 10 and is approximately 2 Kb.The transcription start site of this noncoding gene is approximately 1 Kb from the first exon of the adjacent GATA3 gene and is synthesized from the antisense strand and consists of two exons. 38Experimental validation by RT-qPCR of GATA3-AS1 expression in breast cancer cell lines MCF-7, MDA-MB-231, and BT474 demonstrated an association between GATA3-AS1 overexpression and neoplastic disease in mammary cells.This was further corroborated in samples from LABC patients in a discovery phase and an independent breast cancer patient cohort (validation phase), where it was observed that overexpression of GATA3-AS1 is associated with patients who do not respond to systemic treatment, which is indicative of the relationship between GATA3-AS1 overexpression and resistance to NAC.Moreover, multivariate logistic regression analysis was performed to demonstrate this association, showing that GATA3-AS1 overexpression is an independent predictor of response and proving that GATA3-AS1 is statistically significant in this model, with 37.49% (95% CI, 6.74%e208.42%).Moreover, high sensitivity and specificity (92.9% and 75%, respectively) with an AUC value of approximately 0.90 suggested that GATA3-AS1 is a potential biomarker for predicting NAC response in clinical practice to improve therapy for LABC within luminal B-like phenotype patients.Several preclinical studies have identified lncRNAs that proposed as response biomarkers despite the reduced use of patient samples, such as UCA1, 55 TP53COR1 (or lincRNA-p21), 56 GAS5, 57 and HOTAIR, 46 but clinical application of each lncRNA analyzed is relevant to each type of treatment in the cancer patients in whom they were evaluated.The lncRNA GATA3-AS1, despite having been validated in a small number of patients, might be an important biomarker of response to chemotherapy similar to the previously mentioned lncRNAs.Further studies will be needed to accurately evaluate the potential use of this lncRNA as a clinical predictive biomarker in the luminal B-like phenotype in a large cohort of patients.
Furthermore, patients in the luminal B group are of clinical interest because luminal B is a specific subtype that contains clinically aggressive ER-positive breast cancers in which patients present an intermediate prognosis with a high variety of responses to different treatments. 58In particular, it was determined that GATA3-AS1 is overexpressed in luminal B-like patients who do not respond to neoadjuvant chemotherapy.A clear association was observed between GATA3-AS1 overexpression and nonresponders; however, no association was found between patient outcomes when OS was assessed against GATA3-AS1 overexpression.Interestingly, the expression profile of GATA3-AS1 is similar to that of lncRNAs, such as DSCAM-AS1, where its expression is not directly associated with prognosis but rather with response to treatment or progression of the disease. 20Moreover, both lncRNAs exhibited higher expression in luminal B patients analyzed.It is also necessary to include patients diagnosed with luminal A breast cancer subtypes to elucidate whether it is possible to extend the predictive value of GATA3-AS1 in this group and to generalize its use as a molecular biomarker.LncRNAs are known useful factors in treatment selection, independently of molecular cancer subtype 15 ; however, this will likely require additional large cohorts to be analyzed that include both hormone-positive subtypes.
Conversely, the coding gene GATA3 has a wide expression pattern in patients and breast cancer cell lines, as observed by RNA-Seq analysis.As previous studies have demonstrated, mRNAs tend to have less tissue-and stagespecific expression 42,59,60 in contrast to lncRNAs, which tend to have more tissue-specific 19,61,62 and stage-specific expression in disease, which is one of the main reasons lncRNAs have been proposed as molecular biomarkers in cancer. 16,63,64Alternatively, results by receiver operating characteristic curve showed that GATA3 had sensitivity of 52.4% and specificity of 73.9% with an AUC of 0.60 in RT-qPCR expression analyses; the results suggest that GATA3 mRNA expression has low ability to distinguish responders from nonresponders to NAC due to its low sensitivity.Currently, GATA3 is a molecular marker in breast cancer that has had a controversial clinical role. 39GATA3 protein expression has been associated with a favorable prognosis and increased survival in patients with invasive breast carcinoma; however, GATA3 has not been shown to be a lncRNA Predicts Chemotherapy Resistance The Journal of Molecular Diagnostics -jmdjournal.orgreliable prognostic factor regardless of ER status.65e67 Likewise, clinical data regarding the role of GATA3 in treatment response prediction have also been controversial. 68,69There is evidence suggesting that GATA3 could be used as a biomarker for predicting response to NAC.It was observed that absence of GATA3 is an independent pathologic complete response predictor to neoadjuvant chemotherapy through multivariate analysis, suggesting that GATA3 might be clinically useful as a predictor of poor response to chemotherapy. 70However, these results did not evaluate the sensibility and specificity of GATA3; thus, more studies are necessary to determine the implications of GATA3 as a predictor biomarker to NAC response.To date, there is controversy over the use of GATA3 as a predictive biomarker for NAC 69e71 and, because of the low sensitivity found in the present study, the coding gene GATA3 cannot be proposed as a biomarker for NAC resistance, even though it is co-expressed with GATA3-AS1.The results obtained in this study indicate that there is a moderate correlation between GATA3-AS1 and GATA3 mRNA expression.However, in the literature, it has been suggested that in breast cancer, GATA3-AS1 does not regulate GATA3 mRNA expression; instead, GATA3-AS1 regulates GATA3 protein accumulation level through a degradation mechanism. 72Therefore, GATA3 protein levels, measured by immunohistochemistry, cannot distinguish patients who will be resistant to neoadjuvant chemotherapy, but the possibility that GATA3 protein accumulation levels could be regulated by GATA3-AS1, as has been demonstrated in triple-negative breast cancer cell lines, cannot be ruled out. 72However, this is beyond the scope of this study, and further research is needed.Thus, in locally advanced breast cancer luminal B-like phenotype, regulation of GATA3 mRNA could not depend on GATA3-AS1 despite there being a positive correlation in the expression levels of both transcripts.
On the other hand, there is scientific evidence that altered gene expression in breast cancer tumors is involved in neuronal-related pathways 73 and processes.74e76 These results showed that tumors of nonresponders to NAC overexpressed lncRNA genes that are related to Kyoto Encyclopedia of Genes and Genomes pathways, such as taste transduction and Parkinson disease.This is in accordance with other reports in which lncRNA expression in breast cancer cells is related to the dysregulation of neuronal-related pathways, such as lncRNA IRAIN and its targets involved in cholinergic synapses 77 and the lncRNA-mRNA coexpression network, which is related to taste transduction in docetaxel-resistant breast cancer cell lines. 78In addition, this study demonstrated that the expression profile of mRNA in breast cancer tumors is involved in the neuronal system signaling, as has been described for other processes, such as cranial nerve and neural crest development in breast cancer patients. 76,79ogether, these data indicate that genes involved in neuronal processes are regulated by lncRNAs in breast cells and, in cancer conditions, might acquire oncogenic potential, contributing to breast cancer progression, resistance to treatment, or metastasis development, as shown for lncRNA BORG, which is associated with brain metastasis. 80inally, the divergent lncRNA GATA3-AS1 is a noncoding transcript described in T lymphocytes with an important role in T-cell differentiation, 38 and it has been associated with respiratory pathologies, such as asthma and rhinitis. 81In this study, it was found that GATA3-AS1 is also expressed in mammary cells, but until now, the function of GATA3-AS1 in breast tissue has not been described, as seen in hepatocellular carcinoma. 82Further research is needed to establish the function and molecular mechanisms of GATA3-AS1 in mammary tissue and how it associates with breast neoplastic disease and NAC resistance.
In conclusion, the presence of an lncRNA profile capable of defining patients nonresponsive to NAC in LABC luminal B-like patients by RNA-Seq analysis was demonstrated.Particularly, the divergent lncRNA GATA3-AS1 showed high specificity and sensitivity associated with its predictive value in nonresponders to NAC treatment, making it the first molecular biomarker with a potential use in clinical practice in the prediction of NAC treatment response in breast cancer.Further investigation is needed to discover whether its predictive value is applicable to other molecular subtypes and to uncover the molecular mechanisms of this lncRNA in NAC resistance.

Figure 1
Figure 1 Experimental research design.A: The study was divided in two phases, discovery and validation phases, with snap-frozen pretreatment core Contreras-Espinosa et al 1308 jmdjournal.org-The Journal of Molecular Diagnostics

Figure 3
Figure 3 Overexpressed and underexpressed long noncoding RNAs (lncRNAs), as well as machine learning analysis, define nonresponders to neoadjuvant chemotherapy.A: Venn diagram showing the overlap (blue) of lncRNAs differentially underexpressed in the nonresponder group, identified by Salmon þ DESeq2 (green) and STAR þ feature counts (fc) þ DESeq2 (purple) analysis.B: Venn diagram showing the overlap (orange) of lncRNAs differentially overexpressed in the nonresponder group, identified by Salmon þ DESeq2 (pink) and STAR þ fc þ DESeq2 (yellow) analysis.C: Random forest analysis showing the mean decrease in accuracy of lncRNA (box plot; left panel) and quartile CV (bar plot; right panel), showing the 50 most significant lncRNAs in identifying nonresponder patients.The most important lncRNAs for predicting treatment response in the model are indicated in green.DE, differential expression.

Figure 4
Figure 4 GATA3-AS1 is overexpressed in breast cancer (BC) cell lines and in patients with resistance to neoadjuvant chemotherapy.A: The histograms show RNA-sequencing (RNA-Seq) mapped lectures in the GATA3-AS1 genomic locus using the genome hg38.Canonical GATA3-AS1 transcript and its six isoforms are shown [Ensemble identifier (ID), https://www.ensembl.org/index.html,last accessed February 25, 2021; top panel].Nonresponder patients are represented in the top panel, whereas responder patients are represented on the bottom panel.B: Bar plot showing transcript per million (TPM) normalized expression of GATA3-AS1 in nonresponder (gray) and responder (black) patients.C: Histograms show RNA-Seq mapped lectures in the GATA3-AS1 genomic locus using the genome hg19.The canonical GATA3-AS1 transcript and its six isoforms are shown in the top panel (Ensemble ID).Breast cancer cell lines and transformed noncancerous breast cell line data were obtained from the Cancer Cell Line Encyclopedia (https://portals.broadinstitute.org/ccle,last accessed February 25, 2021).D: Bar plot showing reads per kilobase of transcript per million reads mapped (RPKM) normalized expression of GATA3-AS1 in 33 breast cancer cell lines (gray) and 2 transformed noncancerous breast cell lines (black boxed area).Data were obtained from the Library of Integrated Network-Based Cellular Signatures database (http://lincs.hms.harvard.edu/db/datasets,last accessed May 14, 2020).*Canonical GATA3-AS1 transcript.y Unreported isoform of GATA3-AS1 in Ensembl database; the ID was reported according to University of California, Santa Cruz, RefSeq annotation.n Z 11 (A).

Figure 5
Figure 5 Higher relative expression levels of GATA3-AS1 are associated with response to neoadjuvant chemotherapy.A: Bar plot of RT-qPCR quantification by fold change calculation (DDCt), showing overexpression of GATA3-AS1 in breast cancer cell lines MCF-7, BT474, and MDA-MB-231, in which MCF-7 shows the highest GATA3-AS1 overexpression (analysis of variance with the Tukey test).B: Box plot comparing GATA3-AS1 relative expression between responder and nonresponder patients; a nonparametric U-test/Wilcoxon test was implemented, showing differences between medians.C: Receiver operating characteristic curve analysis indicates that GATA3-AS1 overexpression predicts nonresponse in neoadjuvant chemotherapy patients (cutoff Z 0.000064 relative expression is normalized with RPS28 as housekeeping gene) and is associated with a high sensitivity (92.9%) and specificity (75.0%), with P Z 0.0001 and area under the curve Z 0.876.D: Tumor purity analysis of the 11 patients from RNA-sequencing analysis (gray dots correspond to the nonresponders, and black dots correspond to responders).n Z 3 (A); n Z 68 (B and C).****P < 0.0001 versus MCF-10A; y P < 0.05 versus nonresponders.ER, estrogen receptor; HER2, human epidermal growth factor receptor 2.

Functional
and Pathway Analysis of mRNAs in Patients Nonresponsive to NAC

Figure 6
Figure 6 Functional analysis of long noncoding RNAs (lncRNAs) in nonresponder patients to neoadjuvant chemotherapy.Significantly enriched pathways for lncRNAs differentially expressed in the nonresponder group.Functional enrichment analysis was performed for underexpressed lncRNAs (left side), overexpressed lncRNAs (middle), and most significant lncRNAs (right side).The rows show the set that was analyzed, and columns show the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways.The more significant adjusted P value is indicated by the intensity of the red color, and the enrichment level of pathways is indicated by the normalized enrichment score (NES) score based on the size of the dot, as indicated.

Figure 7
Figure 7 Functional and pathway analysis of mRNAs in nonresponder patients to neoadjuvant chemotherapy.A: Top 20 enriched biological processes of differentially expressed mRNAs in the nonresponder group.The significance of the P value is indicated by the intensity of the red color.B: Interaction networks of the top 20 enriched biological processes.Nodes are indicated with dots, and relationships between nodes are indicated by gray lines.C: Enrichment map for differentially expressed mRNAs in the nonresponder group.Columns show the mRNA symbol, and rows show the enriched term for biological processes.The enrichment fold change is indicated by the intensity of the color scales.D: Network of gene pathways showing differentially expressed mRNAs in the nonresponder group.Gene names are indicated in red and green dots, whereas gene category is indicated in gray dots.Nodes are indicated with dots, and relationships between nodes are indicated by gray lines.The number of genes of every node is indicated by the size of the dot, and the color of dots indicates the fold change (maximum fold change in red, and minimum fold change in green).
Contreras-Espinosa et al 1310 jmdjournal.org-The Journal of Molecular Diagnostics patients (Figure

Table 2
Clinicopathologic Characteristics of the Breast Cancer Patients Analyzed Transcriptome profiling of long noncoding RNAs (lncRNAs) in breast cancer (BC) patients is associated with nonresponder patients.A: Pie charts showing the proportion of underexpressed (left panel) and overexpressed (right panel) RNA biotypes in the nonresponder group.mRNA, lncRNA, and other RNA biotypes are indicated in blue, yellow, and green, respectively.Below is a volcano plot of the identified differentially expressed lncRNAs in nonresponder patients.
Data are given as n (%), unless otherwise indicated.*Statisticallysignificant.HER2, human epidermal growth factor receptor 2; IDC, infiltrating ductal carcinoma; ILC, infiltrating lobular carcinoma; P, percentile.lncRNAPredictsChemotherapyResistanceTheJournal of Molecular Diagnostics -jmdjournal.orgGATA3-AS1IsOverexpressed in Breast Cancer Cell Lines and in Patients Resistant to NACTo corroborate the expression profile of GATA3-AS1, RNA-Seq data from breast cancer patients and breast cancer cell lines were used.For this purpose, expression levels of GATA3-AS1 were evaluated from RNA-Seq data in patients derived from the discovery phase.It was found that in samples from nonresponder patients, GATA3-AS1 was overexpressed compared with responder patients, showing basal expression Figure 2 Blue, yellow, and green dots correspond to mRNA, lncRNA, and other RNAs, respectively [false discovery rate (FDR) < 0.05, log2 fold change > 1.5 for up-regulation and < À1.5 for downregulation].Red dots correspond to RNA with no significant changes in the nonresponder group.B: Heat map of hierarchical clustering analysis of the top 44 differentially expressed lncRNAs between responder and nonresponder patients.Rows and columns represent differentially expressed lncRNAs and tissue samples, respectively.The color scale represents expression levels.Red and green colors represent up-regulated and down-regulated lncRNAs, respectively (FDR < 0.05).n Z 11 (B).FC, fold change.Contreras-Espinosa et al 1312 jmdjournal.org-The Journal of Molecular Diagnostics when visualized by histograms (Figure

Table 3
Bivariate and Multivariate Analysis to Identify Clinical Variables Related to GATA3-AS1 Expression in Nonresponder Patients Treated with NAC Contreras-Espinosa et al 1316 jmdjournal.org-The Journal of Molecular Diagnostics