2x coding speed https://github.blog/2022-09-07-research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/

code improves reasoning

starcoder has reasoning abilities https://twitter.com/LoubnaBenAllal1/status/1655932410566168577
replit too (amasad tweet only source so far)
yao fu is exploring this actively https://twitter.com/Francis_YAO_/status/1657985409706762241
Large Language Models trained on code reason better, even on benchmarks that have nothing to do with code. Some good discussion here about the topic:
- linked to coding -> chain of thought

ccording to the post, Claude 2 now 71.2%, a significant upgrade from 1.3 (56.0%). (Found in model card: pass@1)

For comparison:

GPT-4 claims 85.4 on HumanEval, in a recent paper https://arxiv.org/pdf/2303.11366.pdf GPT-4 was tested at 80.1 pass@1 and 91 pass@1 using their Reflexion technique. They also include MBPP and Leetcode Hard benchmark comparisons
WizardCoder, a StarCoder fine-tune is one of the top open models, scoring a 57.3 pass@1, model card here: https://huggingface.co/WizardLM/WizardCoder-15B-V1.0
The best open model I know of atm is replit-code-instruct-glaive, a replit-code-3b fine tune, which scores a 63.5% pass@1. An independent developer abacaj has reproduced that announcement as part of code-eval, a repo for getting human-eval results: https://github.com/abacaj/code-eval

Those interested in this area may also want to take a look at this repo https://github.com/my-other-github-account/llm-humaneval-ben... that also ranks with Eval+, the CanAiCode Leaderboard https://huggingface.co/spaces/mike-ravkine/can-ai-code-resul... and airate https://github.com/catid/supercharger/tree/main/airate

Also, as with all LLM evals, to be taken with a grain of salt...pull

Liu, Jiawei, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. “Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation.” arXiv, June 12, 2023. https://doi.org/10.48550/arXiv.2305.01210.

Data/Timeline

2010: natural language coding is going to work https://writings.stephenwolfram.com/2010/11/programming-with-natural-language-is-actually-going-to-work/
Oct 2021: Github Copilot technical preview - team of 6 working on it
Dec 2021: Github Copilot for businesses
Feb 2022: February, DeepMind introduced AlphaCode, a transformer pretrained on 86 million programs in 12 programming languages and fine-tuned on entries to coding contests. At inference, it generates a million possible solutions and filters out the bad ones. In this way, it retroactively beat more than half of contestants in 10 coding competitions.
Apr 2022: https://www.allendowney.com/blog/2023/04/02/llm-assisted-programming/ state of programming
Jun 2022: Github Copilot GA
Sep 2022: Github Copilot productivity survey
Sep 2022: BigCODE https://www.servicenow.com/blogs/2022/bigcode-large-language-models.html
Oct 2022: The Stack: 3 TB of permissively licensed source code in 30 programming languages https://huggingface.co/datasets/bigcode/the-stack
Nov 2022: Kite.com public failure https://www.kite.com/blog/product/kite-is-saying-farewell/
- Our diagnosis is that individual developers do not pay for tools. Their manager might, but engineering managers only want to pay for discrete new capabilities, i.e. making their developers 18% faster when writing code did not resonate strongly enough.
Nov 2022: https://www.codeium.com/blog/beta-launch-announcement
- https://chrome.google.com/webstore/detail/codeium/hobjkcpmjhlegmobgonaagepfckjkceh/related
Dec 2022: reverse engineering copilot https://thakkarparth007.github.io/copilot-explorer/posts/copilot-internals.html#other-random-tidbits
https://github.com/fauxpilot/fauxpilot This is an attempt to build a locally hosted version of GitHub Copilot. It uses the SalesForce CodeGen models inside of NVIDIA's Triton Inference Server with the FasterTransformer backend.
Dec 2022: alphacode evaluation https://github.com/norvig/pytudes/blob/main/ipynb/AlphaCode.ipynb
Jan 2023: Copilot Labs https://marketplace.visualstudio.com/items?itemName=GitHub.copilot-labs
Feb 2023 https://www.bleepingcomputer.com/news/security/github-copilot-update-stops-ai-model-from-revealing-secrets/ Copilot will introduce a new paradigm called "Fill-In-the-Middle," which uses a library of known code suffixes and leaves a gap for the AI tool to fill, achieving better relevance and coherence with the rest of the project's code. Additionally, GitHub has updated the client of Copilot to reduce unwanted suggestions by 4.5% for improved overall code acceptance rates. "When we first launched GitHub Copilot for Individuals in June 2022, more than 27% of developers’ code files on average were generated by GitHub Copilot," Senior Director of Product Management Shuyin Zhao said.

"Today, GitHub Copilot is behind an average of 46% of a developers’ code across all programming languages—and in Java, that number jumps to 61%."

March 2023 - more ambitious with small scripts
- https://simonwillison.net/2023/Mar/27/ai-enhanced-development/
- geoffrey litt stuff
March 2023 - Codium AI - 11m seed - https://www.codium.ai/blog/codiumai-powered-by-testgpt-accounces-beta-and-raised-11m/
April 2023 - Replit v1 code 3b announced
May 2023 - Huggingface/ServiceNow Starcoder https://techcrunch.com/2023/05/04/hugging-face-and-servicenow-release-a-free-code-generating-model/?guccounter=1
June 2023 - phi-1 beats chatgpt at coding with 1.3b parameters, and only 7B tokens for several epochs of pretraining data. 1/7th of that data being synthetically generated :O The rest being extremely high quality textbook data https://twitter.com/Teknium1/status/1671336110684012545?s=20
- https://twitter.com/EldanRonen/status/1671361731837456385
- https://twitter.com/SebastienBubeck/status/1671326369626853376?s=20
aug 2023 - july shanghai newhope model https://twitter.com/mathemagic1an/status/1686814347287486464?s=20

Known Issues

Ryan Salva on how Copilot works + how to gain developer trust https://news.ycombinator.com/item?id=33226515
https://medium.com/@enoch3712/github-copilot-is-under-the-hood-how-it-works-and-getting-the-best-out-of-it-4699d4dc3cd8
- cushman - 2048 tokens
- davinci - 4k tokens
vulnerabilities https://www.spiceworks.com/it-security/security-general/news/40-of-code-produced-by-github-copilot-vulnerable-to-threats-research/
- codex-davinci-002 Do Users Write More Insecure Code with AI Assistants some vulns found in C code with 75 participants - media report
- codex-cushman-001 https://arxiv.org/abs/2208.09727
Github Copilot investigation https://news.ycombinator.com/item?id=33240341
Readers write more insecure code https://arxiv.org/abs/2211.03622 https://info.deeplearning.ai/generated-code-makes-overconfident-programmers-chinas-autonomous-drone-carrier-does-bot-therapy-require-informed-consent-mining-for-green-tech-1

code models

bloom bigcode https://www.servicenow.com/blogs/2022/bigcode-large-language-models.html
salesforce codegen
- Nijkamp, E., Pang, B., Hayashi, H., Tu, L., Wang, H., Zhou, Y., Savarese, S., and Xiong, C. (2022). Codegen: An open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474.
- Nijkamp, E., Hayashi, H., Xiong, C., Savarese, S., and Zhou, Y. (2023). Codegen2: Lessons for training llms on programming and natural languages. arXiv preprint arXiv:2305.02309.
- Codegen 2.5
  - just one subtle detail added to this model makes codegen 2.5 substantially faster than codegen 2 All it required was increasing the number of attention heads from 16 to 32...
  - grafted onto openllama https://twitter.com/abacaj/status/1677333465996353541
the stack from eleuther
Li, R., Allal, L. B., Zi, Y., Muennighoff, N., Kocetkov, D., Mou, C., Marone, M., Akiki, C., Li, J., Chim, J., et al. (2023). Starcoder: may the source be with you! arXiv preprint arXiv:2305.06161.
https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-base

benchmarks

https://arxiv.org/pdf/2303.06689.pdf MBPP [Austin et al., 2021] This benchmark, referred to as "Mostly Basic Programming Problems", contains nearly 1000 crowd-sourced python programming problems, covering programming fundamentals, standard library functionality, and more. Each problem in the benchmark consists of a NL description, a code solution, and 3 automated test cases. A portion of the manually verified data is extracted as "MBPP-sanitized". For MBPP, which does not include function signatures, only the NL description is provided as input.

HumanEval [Chen et al., 2021] This benchmark is a set of 164 handwritten programming problems, proposed by OpenAI. Each problem includes a function signature, NL description, function body, and several unit tests, with an average of 7.7 tests per problem. For HumanEval, function signature, NL description, and public test cases are provided as input. Furthermore, we utilize an expanded version of MBPP and HumanEval , which includes over 100 additional test cases per task, to reinforce the validity of code evaluation [Dong et al., 2023]. This extended version is referred to as MBPP-ET and HumanEval-ET.

bigcode eval harness https://github.com/bigcode-project/bigcode-evaluation-harness/

products

(alessio's blogpost https://evcrevolution.com/p/evc-10-llm-for-developers)

sourcegraph list https://github.com/sourcegraph/awesome-code-ai

tensai refactor pr codegen https://twitter.com/mathemagic1an/status/1610023513334878208?s=46&t=HZzqUlCKP3qldVBoBwEzZg
Magic https://techcrunch.com/2023/02/06/magic-dev-code-generating-startup-raises-23m/
unmaintained
- https://github.com/CodedotAl/gpt-code-clippy
- https://github.com/samrawal/emacs-secondmate
Code IDEs
- Introducing Cursor!! (https://cursor.so)Cursor IDE https://twitter.com/amanrsanger/status/1615539968772050946
  - why is this not a vscode extension?
- https://idx.dev/ Project IDX is an entirely web-based workspace for full-stack application development, complete with the latest generative AI (powered by Codey and PaLM 2), and full-fidelity app previews
- E2b - from vasek https://github.com/e2b-dev/e2b
the pandas extension thing - https://github.com/approximatelabs/sketch
- built on lambdaprompt https://github.com/approximatelabs/lambdaprompt
- pandas dataframe chat https://github.com/gventuri/pandas-ai
- prefectio marvin ai
custom languages
- LMQL
- https://github.com/georgia-tech-db/eva EVA DB is an AI-SQL database system for developing applications powered by AI models. We aim to simplify the development and deployment of AI-powered applications that operate on structured (tables, feature stores) and unstructured data (videos, text, podcasts, PDFs, etc.). EVA DB accelerates AI pipelines by 10-100x using a collection of performance optimizations inspired by time-tested SQL database systems, including data-parallel query execution, function caching, sampling, and cost-based predicate reordering. EVA supports an AI-oriented SQL-like query language tailored for analyzing both structured and unstructured data. It has first-class support for PyTorch, Hugging Face, YOLO, and Open AI models.
- https://github.com/alantech/marsha LLM-based programming language. Describe what you want done with a simple syntax, provide examples of usage, and the Marsha compiler will guide an LLM to produce tested Python software.
copilot labs
- https://redmonk.com/jgovernor/2023/01/06/the-future-just-happened-developer-experience-and-ai-are-now-inextricably-linked/
http://www.useadrenaline.com/ Show HN: Fully LLM powered code repair – fix and explain your code in seconds
Gptcommit: Never write a commit message again (with the help of GPT-3)
- yet another https://news.ycombinator.com/item?id=34591733
- https://github.com/Nutlope/aicommits - or chadCommit inside vscode
- https://github.com/di-sukharev/opencommit
https://github.com/paul-gauthier/aider
vscode extensions
- https://newsletter.pragmaticengineer.com/p/ai-coding-tools
- https://continue.dev/
santacoder typosaurus https://twitter.com/corbtt/status/1616270918774575105?s=46&t=ZSeI0ovGBee8JBeXEe20Mg semantic linter that spots errors in code
GPT Prompt Engineer https://github.com/mshumer/gpt-prompt-engineer
Buildt - AI-powered search allows you to find code by searching for what it does, not just what it is.
- https://twitter.com/AlistairPullen/status/1611011712345317378
https://www.grit.io/
codegen ai
Continue.dev VSCode downloads ~15K, Rift ~2,100
morph labs rift
qqbot - dan robinson?
YC
- code generation - second.dev https://news.ycombinator.com/item?id=35083093
Pygma is used to convert Figma mockups into production-ready code.
code search
- Phind https://news.ycombinator.com/item?id=35543668
- bloop - AI code search https://news.ycombinator.com/item?id=34892541
  - private code search w animation
  - https://news.ycombinator.com/item?id=36260961
- sourcegraph cody
- buildt stackoverflow.gg https://twitter.com/bentossell/status/1622513022781587456
What comes after Copilot? My take: a conversation with your codebase! Introducing Tensai, your repo-level code assistant http://TensaiCode.com - jay hacks
Tabby - Self Hosted GitHub Copilot https://news.ycombinator.com/item?id=35470915
codecomplete - ycw23 - copilot for enterprise https://news.ycombinator.com/item?id=35152851
- CodeComplete offers an experience similar to Copilot; we serve AI code completions as developers type in their IDEs. However, instead of sending private code snippets to GitHub or OpenAI, we use a self-hosted LLM to serve code completions. Another advantage with self-hosting is that it’s more straightforward to securely fine-tune to the company’s codebase. Copilot suggestions aren’t always tailored to a company’s coding patterns or internal libraries, so this can help make our completions more relevant and avoid adding tech debt.
anysphere control.dev - an AI code editor that harnesses the power of GPT-4. It’s a drop-in replacement for VS Code, has context about your closed-source codebase, and it will make you 2x more productive tomorrow.
socket.dev ai security scanning https://socket.dev/blog/introducing-socket-ai-chatgpt-powered-threat-analysis
- https://www.theregister.com/2023/03/30/socket_chatgpt_malware/
agent writing its own code in a loop https://github.com/pHaeusler/micro-agent

autogenerate PRs

https://www.grit.io/
https://twitter.com/MrHunterBrooks/status/1639373651010109442?s=20
https://github.com/gitstart
AutoPR, a Github Action that autonomously writes a pull request in response to an issue https://twitter.com/IrgolicR/status/1652451501015457798
code generation
- codegen.ai
- https://github.com/paul-gauthier/aider
Sweep.dev https://news.ycombinator.com/item?id=36987454

commit msg generation

https://github.com/di-sukharev/opencommit
ai-commit
ai CLI from builderio https://github.com/BuilderIO/ai-shell

Test generation

Codium - https://www.codium.ai/blog/codiumai-powered-by-testgpt-accounces-beta-and-raised-11m/ - video demo https://twitter.com/mathemagic1an/status/1638598693623582720

GPT low code

https://github.com/jbilcke/latent-browser hallucinate by MIME types
https://github.com/TheAppleTucker/backend-GPT backend is all you need
https://withsutro.com/ text to app

alternative languages

https://github.com/jbrukh/gpt-jargon pseudolanguage
https://github.com/eth-sri/lmql
https://github.com/microsoft/guidance/
- https://twitter.com/altryne/status/1661237105278988290/photo/1
- alternative https://blog.normalcomputing.ai/posts/2023-07-27-regex-guided-generation/regex-guided-generation.html

function sdks

Python/pydantic https://twitter.com/AAAzzam/status/1671608335001370625

misc

The size of all code/history on Github public repos is 92TB The size of Google's monorepo in 2015 was 86TB (of much higher quality code) If Google were willing to deploy code models trained on their own data, they'd have a noticable advantage over everyone else. https://twitter.com/amanrsanger/status/1656696500339249153
https://arxiv.org/pdf/2303.06689.pdf importance of planning in codegen
maybe use tree of thoughts
CLI https://twitter.com/SpellcraftAI/status/1593393643305459712

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CODE.md

CODE.md

Data/Timeline

Known Issues

code models

benchmarks

products

autogenerate PRs

commit msg generation

Test generation

GPT low code

alternative languages

function sdks

Python/pydantic https://twitter.com/AAAzzam/status/1671608335001370625

misc

Files

CODE.md

Latest commit

History

CODE.md

File metadata and controls

Data/Timeline

Known Issues

code models

benchmarks

products

autogenerate PRs

commit msg generation

Test generation

GPT low code

alternative languages

function sdks

Python/pydantic https://twitter.com/AAAzzam/status/1671608335001370625

misc