NCBI Comparative Genomics Codeathon: Technical Help for Participants
For general support, please use the #help-desk channel in Slack.
GitHub
1. Team Repositories
All code generated by your team should be included in your team’s repository on GitHub. The repository must include a README that documents your team’s objectives and products. The contents of the repository should be sufficient to reproduce and build on your work.
GitHub repositories for this codeathon will be https://github.com/NCBI-Codeathons/cg-2025-team-* (where * denotes team lead's last name). Participants who provided their GitHub username in their application and RSVP should have received an invite to their respective repository. Look for an email with a subject line, "Invitation to collaborate on [repository name]". The link expires in 7 days. Team leads were set as admin to their accounts, so if you need access please ask your team or help-desk. Should you run into any permission issues throughout the event, please share in help-desk.
Examples of previous NCBI codeathon repositories:
- https://github.com/NCBI-Codeathons/mlxai-2024-team-beaumont
- https://github.com/NCBI-Codeathons/pubmed-codeathon-team1
- https://github.com/NCBI-Hackathons/PubRunner
- https://github.com/NCBI-Hackathons/MetagenomicAntibioticResistance
2. Git/GitHub Resources
Here are some resources for learning git or refreshing your knowledge:
GCP Quickstart guide to help beginners
This guide provides a step-by-step introduction to GCP, covering essential concepts and tasks. It's a great starting point for those unfamiliar with cloud computing or GCP specifically.
Key topics covered in the Quickstart guide include creating a GCP account, creating a project, creating a compute instance, connecting to the instance, running a basic command.
GCP Tutorials
These tutorials will guide you through using specific GCP services that might be relevant to the codeathon.
Creating a Compute Engine Instance: This tutorial guides you through creating a virtual machine (VM) on Google Cloud Platform (GCP). Ideal for tasks requiring significant processing power or memory, such as running complex applications or data analysis pipelines.
Using Cloud Storage: This overview provides information on storing and accessing data in Google Cloud Storage. Suitable for storing files of any size, such as images, videos, documents, and data backups.
Creating a Bucket: A bucket is a container for objects in Cloud Storage. This tutorial explains how to create one. Used to organize data within Cloud Storage, enabling efficient management and access.
Transferring Data to Cloud Storage: This blog post outlines several methods for transferring data to Cloud Storage, including Cloud Storage transfer tools, Storage Transfer Service, Transfer Appliance, and BigQuery Data Transfer Service.
Creating a Cloud SQL Instance: Cloud SQL is a managed relational database service. This tutorial demonstrates how to create a MySQL instance. Ideal for storing and managing structured data for web applications or other applications.
STRIDES GCP Tutorial Resources for Biomedical Research: This page offers tutorials on running biomedical workflows on Google Cloud Platform (GCP), covering machine learning, medical imaging, genomics, and more. Resources are organized by research topic for easy access to relevant tools and workflows.
Additional Security Considerations
Adding Keys to Instances: Before accessing compute engine instances, it's essential to configure secure access using SSH keys. This document explains the process.Setting Up BigQuery Access: BigQuery offers granular access control to ensure data security. This documentation covers creating and managing access permissions.
GCP for NCBI Codeathon Participants
Policy for Research Costs
Throughout the event, we will support cloud computing costs within the designated codeathon GCP accounts. We cannot reimburse or support costs outside the designated GCP accounts, including third-party libraries, software, or other billable GCP services. Participants are strongly encouraged to use open-source tools and public databases to ensure accessibility and reproducibility.
Security Reminder
During the NCBI codeathon, it's important to prioritize the security of your Google Cloud Platform (GCP) account. This includes keeping your GCP account credentials, such as your API keys or project IDs, private. Sharing this information publicly could lead to unauthorized access to your GCP resources and potential financial charges.
Here are some common ways GCP account credentials might accidentally be made public:
- Uploading code to GitHub: If your code includes GCP project IDs, API keys, or other sensitive information, uploading it to a public GitHub repository could expose your credentials to anyone with internet access.
- Sharing configuration files: Configuration files used for GCP projects might contain API keys or other sensitive details. Be mindful of who you share these files with and avoid posting them online.
- Embedding credentials in logs: Logs generated during development or testing might contain snippets of code that include GCP credentials. Make sure to remove these sensitive details before sharing logs publicly.
Policy for Extended Cloud Access
Extended cloud access is available on a case-by-case basis and must be requested in advance. This extension is intended primarily for finishing critical computations and downloading necessary data after the codeathon concludes.
- Request Deadline: The team lead must notify organizers of the need for extended access by Friday, September 12th at noon.
- Maximum Extension Period: Access can be extended until the following Wednesday, September 17th at midnight ET.
- Spending Limits: Spending alerts will be configured, and any expenses exceeding $1,000 during the extended period will trigger the automatic termination of running jobs and machines to prevent excessive charges.
Last Reviewed: May 6, 2025