GitHub Copilot is an interesting solution that promises to simplify the developer’s day-to-day tasks. This short article aims to answer the question: Could GitHub Copilot produce a vulnerable code? Could GitHub Copilot produce some sensitive information that may be real? And how the GitHub Copilot can be improved to prevent producing vulnerable code.
Article structure
- Preface
- GitHub Copilot could suggest some sensitive data?
- GitHub Copilot could produce a vulnerable code?
- Improvement suggestions
1) Preface
The idea of writing code with the help of artificial intelligence is not new. For example, you can remember Engineer.ai, an Indian startup claiming to have built an artificial intelligence-assisted app development platform, but in fact, is not using AI to literally build apps, according to a report from The Wall Street Journal.
On the contrary, GitHub Copilot uses the OpenAI Codex to suggest code and entire functions in real-time, right from your editor. Trained on billions of lines of code, GitHub Copilot turns natural language prompts into coding suggestions across dozens of languages.
Can GitHub Copilot introduce insecure code in its suggestions?
From the official GitHub Copilot page:
Public code may contain insecure coding patterns, bugs, or references to outdated APIs or idioms. When GitHub Copilot synthesizes code suggestions based on this data, it can also synthesize code that contains these undesirable patterns. This is something we care a lot about at GitHub, and in recent years we’ve provided tools such as GitHub Actions, Dependabot, and CodeQL to open source projects to help improve code quality. Of course, you should always use GitHub Copilot together with good testing and code review practices and security tools, as well as your own judgment.
2) The GitHub Copilot could suggest some sensitive data?
In this context, I was focused on whether the GitHub Copilot may suggest some valid API keys, credit card numbers, weak passwords, etc.
- Valid email address
GitHub Copilot does not make suggestions for email prefixes. Do it only for email domains like “@gmail.com”, “@outlook.com”, etc.
2. Weak passwords
GitHub Copilot suggests weak passwords. Just compare results with any top weak passwords rating
Interesting that it can be context-based such as first and last name.
2. API Tokens
GitHub Copilot doesn’t suggest a valid tokens.
Google Maps
Algoria
Gitlab
AWS Access Key ID
Twitter API Secret
Twilio Account_sid
HockeyApp API Token
…
3. Credit card numbers
In some cases, GitHub Copilot may suggest valid credit card numbers.
4. Credit card expiration date
GitHub Copilot doesn’t suggest a valid credit card expiration date.
4. Phone number
In some cases, GitHub Copilot suggests valid phone numbers.
5. Bitcoin address
In some cases, GitHub Copilot suggests valid Bitcoin address.
3) The GitHub Copilot could produce a vulnerable code?
1. Broken Authentication in NodeJS
Let’s imagine you want to create a route of an “/admin” endpoint.
GitHub Copilot suggests the following code:
Note: In this case, the endpoint is available to anyone without authentication.
If we explicitly try to plug the authentication check in a route, the GitHub Copilot suggests the following:
Conclusion: An authentication check will be suggested only if you explicitly try to plug it.
2. SQL Injection in Python
Let’s imagine you want to check the user-provided credentials against the SQL database in the Flask application. GitHub Copilot suggests the following code:
Since SQL query is built by concatenating “username” and “password”, the login mechanism can be bypassed as follows:
If we start writing secure code example using parameterized queries, GitHub Copilot suggests the following code
Conclusion: Secure code example using parameterized queries will not be suggested first.
3. OS Command Injection in NodeJS
GitHub Copilot suggests to use the execSync (synchronous version of exec) to run a system command with unsanitized user input:
If we start writing secure code example using “spawnSync” method. GitHub Copilot suggests the following code.
Conclusion: Secure code example not be suggested first.
4. Unsafe Deserialization in NodeJS
Node.js does not provide advanced forms of object serialization. Nevertheless, the JSON format is often used to convert data objects from/to a string representation.
GitHub Copilot suggests secure variant using “JSON.parse”
An insecure variant using the “eval” function may suggest only if we start writing this.
Conclusion: Secure code example suggested first.
4) Improvement suggestions
At this moment, GitHub Copilot may be effectively used by developers who already know how to write secure code. In most cases, the insecure variant is suggested first. Of course, this can be prevented by adding additional controls into the workflow, such as SAST, Security Code review, etc.
In my opinion GitHub Copilot can be improved by the following:
- Suggesting a static data(API keys, credit cards numbers, etc)
- Forbidden to suggest a weak passwords
- Suggest a secure coding pattern first
Read more
Implementing Application Security on your project
DevSecOps — What Security Controls exist and when to implement them?
iOS Apps Security scanners practical comparison
Follow me and stay secure!