Web Interface

Introduction

The Cancer Genomic Data Server (CGDS) web service interface provides direct programmatic access to all genomic data stored within the server. This enables you to easily access data from your favorite programming language, such as Python, Java, Perl, R or MatLab. The CGDS web service is REST-based, meaning that client applications create a query consisting of parameters appended to a URL, and receive back either either text or an XML response. For CGDS, all responses are currently tab-delimited text. Clients of the CGDS web service can issue the following types of queries:

  • What cancer studies are stored on the server?
  • What genetic profile types are available for cancer study X? For example, does the server store mutation and copy number data for the TCGA Glioblastoma data?
  • What case sets are available for cancer study X? For example, what case sets are available for TCGA Glioblastoma?

Additionally, clients can easily retrieve "slices" of genomic data. For example, a client can retrieve all mutation data from PTEN and EGFR in the TCGA Glioblastoma data.

Please note that the example queries below are accurate, but they are not guaranteed to return data, as our database is constantly being updated.

The CGDS R Package

If you are interested in accessing CGDS via R, please check out our CGDS-R library.

Basic Query Syntax

All web queries are available at: webservice.do. All calls to the Web interface are constructed by appending URL parameters. Within each call, you must specify:

  • cmd = the command that you wish to execute. The command must be equal to one of the following: getTypesOfCancer, getNetwork, getCancerStudies, getGeneticProfiles, getProfileData, getCaseLists, getClinicalData, or getMutationData.
  • optional additional parameters, depending of the command (see below).

For example, the following query will request all case lists for the TCGA GBM data:

webservice.do?cmd=getCaseLists&cancer_study_id=gbm_tcga

Response Header and Error Messages

The first line of each response begins with a hash mark (#), and will contain data regarding the server status. For example:

 # CGDS Kernel:  Data served up fresh at:  Wed Oct 27 13:02:30 EDT 2010

If any errors have occurred in processing your query, this will appear directly after the status message. Error messages begin with the "Error:" tag. Warning messages begin with the "# Warning:" tag. Unrecoverable errors are reported as errors. For example:

 # CGDS Kernel:  Data served up fresh at:  Wed Oct 27 13:02:30 EDT 2010
 Error:  No case lists available for cancer_study_id:  gbs.

Recoverable errors, such as invalid gene symbols are reported as warnings. Multiple warnings may also be returned. For example:

 # CGDS Kernel:  Data served up fresh at:  Wed Oct 27 13:06:34 EDT 2010
 # Warning:  Unknown gene:  EGFR11
 # Warning:  Unknown gene:  EGFR12

Deprecated API

As of August, 2011:

  • In previous versions of the API, the getCancerStudies command was referred to as getCancerTypes. For backward compatibility, getCancerTypes still works, but is now considered deprecated.

  • In previous versions of the API, the cancer_study_id parameter was referred to as cancer_type_id. For backward compatibility,, cancer_type_id still works, but is now considered deprecated.

Commands

Get All Types of Cancer

Description

Retrieves a list of all the clinical types of cancer stored on the server.

Query Format

  • cmd=getTypesOfCancer (required)

Response Format

A tab-delimited file with two columns:

  • type_of_cancer_id: a unique text identifier used to identify the type of cancer. For example, "gbm" identifies Glioblastoma multiforme.
  • name: short name of the type of cancer.

Example

Get all Types of Cancer.

Get All Cancer Studies

Description

Retrieves meta-data regarding cancer studies stored on the server.

Query Format

  • cmd=getCancerStudies (required)

Response Format

A tab-delimited file with three columns:

  • cancer_study_id: a unique integer ID that should be used to identify the cancer study in subsequent interface calls.
  • name: short name of the cancer study.
  • description: short description of the cancer study.

Example

Get all Cancer Studies.

Get All Genetic Profiles for a Specific Cancer Study

Description

Retrieves meta-data regarding all genetic profiles, e.g. mutation or copy number profiles, stored about a specific cancer study.

Query Format

  • cmd=getGeneticProfiles (required)
  • cancer_study_id=[cancer study ID] (required)

Response Format

A tab-delimited file with six columns:

  • genetic_profile_id: a unique ID used to identify the genetic profile ID in subsequent interface calls. This is a human readable ID. For example, "gbm_mutations" identifies the TCGA GBM mutation genetic profile.
  • genetic_profile_name: short profile name.
  • genetic_profile_description: short profile description.
  • cancer_study_id: cancer study ID tied to this genetic profile. Will match the input cancer_study_id.
  • genetic_alteration_type: indicates the profile type. Will be one of:
    • MUTATION
    • MUTATION_EXTENDED
    • COPY_NUMBER_ALTERATION
    • MRNA_EXPRESSION
    • METHYLATION
  • show_profile_in_analysis_tab: a boolean flag used for internal purposes (you can safely ignore it).

Example

Get all Genetic Profiles for Glioblastoma (TCGA).

Get All Case Lists for a Specific Cancer Study

Description

Retrieves meta-data regarding all case lists stored about a specific cancer study. For example, a within a particular study, only some cases may have sequence data, and another subset of cases may have been sequenced and treated with a specific therapeutic protocol. Multiple case lists may be associated with each cancer study, and this method enables you to retrieve meta-data regarding all of these case lists.

Query Format

  • cmd=getCaseLists (required)
  • cancer_study_id=[cancer study ID] (required)

Response Format

A tab-delimited file with five columns:

  • case_list_id: a unique ID used to identify the case list ID in subsequent interface calls. This is a human readable ID. For example, "gbm_all" identifies all cases profiles in the TCGA GBM study.
  • case_list_name: short name for the case list.
  • case_list_description: short description of the case list.
  • cancer_study_id: cancer study ID tied to this genetic profile. Will match the input cancer_study_id.
  • case_ids: space delimited list of all case IDs that make up this case list.

Example

Get all Case Lists for Glioblastoma (TCGA).

Get Profile Data

Description

Retrieves genomic profile data for one or more genes.

Query Format

  • cmd=getProfileData (required)
  • case_set_id= [case set ID] (required)
  • genetic_profile_id= [one or more genetic profile IDs] (required). Multiple genetic profile IDs must be separated by comma (,) characters, or URL encoded spaces, e.g. +
  • gene_list= [one or more genes, specified as HUGO Gene Symbols or Entrez Gene IDs] (required). Multiple genes must be separated by comma (,) characters, or URL encoded spaces, e.g. +

You can either:

Response Format 1

When requesting one or multiple genes and a single genetic profile ID (see above), you will receive a tab-delimited matrix with the following columns:

  1. GENE_ID: Entrez Gene ID
  2. COMMON: HUGO Gene Symbol
  3. Columns 3 - N: Data for each case

Response Format 2

When requesting a single gene and multiple genetic profile IDs (see above), you will receive a tab-delimited matrix with the following columns:

  1. GENETIC_PROFILE_ID: The Genetic Profile ID.
  2. ALTERATION_TYPE: The Genetic Alteration Type, e.g. MUTATION, MUTATION_EXTENDED, COPY_NUMBER_ALTERATION, or MRNA_EXPRESSION.
  3. GENE_ID: Entrez Gene ID.
  4. COMMON: HUGO Gene Symbol.
  5. Columns 5 - N: Data for each case.

Examples

See Query Format above.

Get Extended Mutation Data

Description

For data of type EXTENDED_MUTATION, you can request the full set of annotated extended mutation data. This enables you to, for example, determine which sequencing center sequenced the mutation, the amino acid change that results from the mutation, or gather links to predicted functional consequences of the mutation.

Query Format

  • cmd=getMutationData (required)
  • genetic_profile_id= [one or more mutation profile IDs] (required). Multiple genetic profile IDs must be separated by comma (,) characters, or URL encoded spaces, e.g. +
  • case_set_id= [case set ID] (optional). If not provided, all cases that have data in the specified mutation profiles will be queried.
  • gene_list= [one or more genes, specified as HUGO Gene Symbols or Entrez Gene IDs] (required). Multiple genes must be separated by comma (,) characters, or URL encoded spaces, e.g. +

Response Format

A tab-delimited file with the following columns:

  • entrez_gene_id: Entrez Gene ID.
  • gene_symbol: HUGO Gene Symbol.
  • case_id: Case ID.
  • sequencing_center: Sequencer Center responsible for identifying this mutation. For example: broad.mit.edu.
  • mutation_status: somatic or germline mutation status. all mutations returned will be of type somatic.
  • mutation_type: mutation type, such as nonsense, missense, or frameshift_ins.
  • validation_status: validation status. Usually valid, invalid, or unknown.
  • amino_acid_change: amino acid change resulting from the mutation.
  • functional_impact_score: predicted functional impact score, as predicted by: Mutation Assessor.
  • xvar_link: Link to the Mutation Assessor web site.
  • xvar_link_pdb: Link to the Protein Data Bank (PDB) View within Mutation Assessor web site.
  • xvar_link_msa: Link the Multiple Sequence Alignment (MSA) view within the Mutation Assessor web site.
  • chr: chromosome where mutation occurs.
  • start_position: start position of mutation.
  • end_position: end position of mutation.
  • genetic_profile_id: mutation profile id.

Example

Get Clinical Data

Description

Retrieves overall survival, disease free survival and age at diagnosis for specified cases. Due to patient privacy restrictions, no other clinical data is available.

Query Format

  • cmd=getClinicalData (required)
  • case_set_id= [case set ID] (required)

Response Format

A tab-delimited file with the following columns:

  • case_id: Unique Case Identifier.
  • overall_survival_months: Overall survival, in months.
  • overall_survival_status: Overall survival status, usually indicated as "LIVING" or "DECEASED".
  • disease_free_survival_months: Disease free survival, in months.
  • disease_free_survival_status: Disease free survival status, usually indicated as "DiseaseFree" or "Recurred/Progressed".
  • age_at_diagnosis: Age at diagnosis.

Example

Get Clinical Data for All TCGA Ovarian Cases.

Get Protein/Phosphoprotein Antibody Information

Description

Retrieves information on antibodies used by reverse-phase protein arrays (RPPA) to measure protein/phosphoprotein levels.

Query Format

  • cmd=getProteinArrayInfo (required)
  • cancer_study_id= [cancer study ID] (required)
  • protein_array_type= [protein_level or phosphorylation]
  • gene_list= [one or more genes, specified as HUGO Gene Symbols or Entrez Gene IDs]. Multiple genes must be separated by comma (,) characters, or URL encoded spaces, e.g. +

Response Format

You will receive a tab-delimited matrix with the following 4 columns:

  • ARRAY_ID: The protein array ID.
  • ARRAY_TYPE: The protein array antibody type, i.e. protein_level or phosphorylation.
  • GENE: The targeted gene name (HUGO gene symbol).
  • RESIDUE: The targeted resdue(s).

Example

Get RPPA-based Proteomics Data

Description

Retrieves protein and/or phosphoprotein levels measured by reverse-phase protein arrays (RPPA).

Query Format

  • cmd=getProteinArrayData (required)
  • case_set_id= [case set ID] (required)
  • array_info= [1 or 0]. If 1, antibody information will also be exported.

Response Format 1

If the parameter of array_info is not specified or it is not 1, you will receive a tab-delimited matrix with the following columns:

  • ARRAY_ID: The protein array ID.
  • Columns 2 - N: Data for each case.

Response Format 2

If the parameter of array_info is 1, you will receive a tab-delimited matrix with the following columns:

  • ARRAY_ID: The protein array ID.
  • ARRAY_TYPE: The protein array antibody type, i.e. protein_level or phosphorylation.
  • GENE: The targeted gene name (HUGO gene symbol).
  • RESIDUE: The targeted resdue(s).
  • Columns 5 - N: Data for each case.

Example

Linking to Us

Once you have a cancer_study_id, it is very easy to create stable links from your web site to the cBio Portal. Stable links must point to ln, and can include the following parameters:

  • q=[a query following Onco Query Language, e.g. a space separated list of HUGO gene symbols] (required)
  • cancer_study_id=[cancer study ID] (if not specified, do a cross cancer query)
  • report=[report to display; can be one of: full (default), oncoprint_html]

For example, here is a link to the TCGA GBM data for EGFR and NF1:

ln?cancer_study_id=gbm_tcga&q=EGFR+NF1

And a link to TP53 mutations across all cancer studies:

ln?q=EGFR:MUT

What's New

New data and features released
New tools released

    Sign up for low-volume email news alerts:
    
    Or follow us @cbioportal on Twitter

Data Sets

The Portal contains data for 20263 tumor samples from 88 cancer studies. [Details.]

Example Queries

RAS/RAF alterations in colorectal cancer

BRCA1 and BRCA2 mutations in ovarian cancer

POLE hotspot mutations in endometrial cancer

Protein changes in PTEN-altered ovarian cancer samples

TP53 and MDM2/4 alterations in GBM

PTEN mutations in GBM in text format

BRAF V600E mutations across cancer types

Patient view of an endometrial cancer case

What People are Saying

  "Whenever bench scientists ask me how they can look at TCGA data, I've never had a good answer for them. Now I do. The cBio Portal meets a critical need--it is the interface that the cancer research community needs to access the wealth of TCGA. Even as a computational biologist, I use it to follow-up on genes of interest. It makes querying the data much less painful."

– Postdoctoral Fellow, Oregon Health & Science University

  "I would like to congratulate you and the team of the cBio portal. It's just an amazing tool to work with, and we at Mass General really appreciate it."

– Research Fellow at Massachusetts General Hospital

  "As a bench biologist with primary aim of determining gene aberrations in GBM, I found your site absolutely fantastic! Thank you! I have to reiterate how awesome and user-friendly your group has made this site - finally accomplishing the goal of having data easily accessible and meaningful."

– Sr. Research Associate at Knight Cancer Institute/OHSU

  "Thank you for your incredible resource that has helped greatly in accessing the TCGA genomics data."

– Postdoctoral Fellow, Johns Hopkins University School of Medicine, Dept Radiation Oncology and Molecular Radiation Sciences

  "I have been enjoying the ease with which TCGA data can be extracted in R using your CGDS package. Very nice work!"

– Sr. Software Engineer, Institute for Systems Biology

  "Thank you for generating such an excellent software. It is very useful for our research."

– Research Fellow, Memorial Sloan-Kettering Cancer Center

  "Thank you very much for providing and maintaining this great resource."

– Scientist, Discovery Bioinformatics, Biotechnology Company

  "I want to thank you for the nice, useful and user-friendly interface you have generated and shared with the community."

– Postdoctoral Fellow, Harvard Medical School, Children's Hospital Boston

  "This portal is truly the greatest thing since sliced bread. I am making discoveries with it not only in glioblastoma, my primary focus, but in other cancers as well -- it's all so easy with this fantastic tool. And I am enjoying showing it to my colleagues, whose jaws also drop. Thank you a thousand times over for this beautiful public resource. I am looking forward to citing this soon in an upcoming paper..."

– Associate Professor, University of Virginia