Could GitHub Copilot produce a vulnerable code?

YevhSec1
7 min readAug 24, 2022

--

GitHub Copilot is an interesting solution that promises to simplify the developer’s day-to-day tasks. This short article aims to answer the question: Could GitHub Copilot produce a vulnerable code? Could GitHub Copilot produce some sensitive information that may be real? And how the GitHub Copilot can be improved to prevent producing vulnerable code.

Article structure

  1. Preface
  2. GitHub Copilot could suggest some sensitive data?
  3. GitHub Copilot could produce a vulnerable code?
  4. Improvement suggestions

1) Preface

The idea of writing code with the help of artificial intelligence is not new. For example, you can remember Engineer.ai, an Indian startup claiming to have built an artificial intelligence-assisted app development platform, but in fact, is not using AI to literally build apps, according to a report from The Wall Street Journal.

On the contrary, GitHub Copilot uses the OpenAI Codex to suggest code and entire functions in real-time, right from your editor. Trained on billions of lines of code, GitHub Copilot turns natural language prompts into coding suggestions across dozens of languages.

Can GitHub Copilot introduce insecure code in its suggestions?

From the official GitHub Copilot page:

Public code may contain insecure coding patterns, bugs, or references to outdated APIs or idioms. When GitHub Copilot synthesizes code suggestions based on this data, it can also synthesize code that contains these undesirable patterns. This is something we care a lot about at GitHub, and in recent years we’ve provided tools such as GitHub Actions, Dependabot, and CodeQL to open source projects to help improve code quality. Of course, you should always use GitHub Copilot together with good testing and code review practices and security tools, as well as your own judgment.

2) The GitHub Copilot could suggest some sensitive data?

In this context, I was focused on whether the GitHub Copilot may suggest some valid API keys, credit card numbers, weak passwords, etc.

  1. Valid email address

GitHub Copilot does not make suggestions for email prefixes. Do it only for email domains like “@gmail.com”, “@outlook.com”, etc.

2. Weak passwords

GitHub Copilot suggests weak passwords. Just compare results with any top weak passwords rating

Interesting that it can be context-based such as first and last name.

2. API Tokens

GitHub Copilot doesn’t suggest a valid tokens.

Google Maps

Algoria

Gitlab

AWS Access Key ID

Twitter API Secret

Twilio Account_sid

HockeyApp API Token

3. Credit card numbers

In some cases, GitHub Copilot may suggest valid credit card numbers.

4. Credit card expiration date

GitHub Copilot doesn’t suggest a valid credit card expiration date.

4. Phone number

In some cases, GitHub Copilot suggests valid phone numbers.

5. Bitcoin address

In some cases, GitHub Copilot suggests valid Bitcoin address.

3) The GitHub Copilot could produce a vulnerable code?

1. Broken Authentication in NodeJS

Let’s imagine you want to create a route of an “/admin” endpoint.

GitHub Copilot suggests the following code:

Note: In this case, the endpoint is available to anyone without authentication.

If we explicitly try to plug the authentication check in a route, the GitHub Copilot suggests the following:

Conclusion: An authentication check will be suggested only if you explicitly try to plug it.

2. SQL Injection in Python

Let’s imagine you want to check the user-provided credentials against the SQL database in the Flask application. GitHub Copilot suggests the following code:

Since SQL query is built by concatenating “username” and “password”, the login mechanism can be bypassed as follows:

If we start writing secure code example using parameterized queries, GitHub Copilot suggests the following code

Conclusion: Secure code example using parameterized queries will not be suggested first.

3. OS Command Injection in NodeJS

GitHub Copilot suggests to use the execSync (synchronous version of exec) to run a system command with unsanitized user input:

If we start writing secure code example using “spawnSync” method. GitHub Copilot suggests the following code.

Conclusion: Secure code example not be suggested first.

4. Unsafe Deserialization in NodeJS

Node.js does not provide advanced forms of object serialization. Nevertheless, the JSON format is often used to convert data objects from/to a string representation.

GitHub Copilot suggests secure variant using “JSON.parse”

An insecure variant using the “eval” function may suggest only if we start writing this.

Conclusion: Secure code example suggested first.

4) Improvement suggestions

At this moment, GitHub Copilot may be effectively used by developers who already know how to write secure code. In most cases, the insecure variant is suggested first. Of course, this can be prevented by adding additional controls into the workflow, such as SAST, Security Code review, etc.

In my opinion GitHub Copilot can be improved by the following:

  1. Suggesting a static data(API keys, credit cards numbers, etc)
  2. Forbidden to suggest a weak passwords
  3. Suggest a secure coding pattern first

Read more

Implementing Application Security on your project

DevSecOps — What Security Controls exist and when to implement them?

iOS Apps Security scanners practical comparison

OSCP Preparation

Follow me and stay secure!

--

--

YevhSec1

MSc in Cyber Security, OSCP, eWPTXv2, CEH Master. Awarded by Apple, Trello, Paysera..