2x coding speed https://github.blog/2022-09-07-research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/
code improves reasoning
- starcoder has reasoning abilities https://twitter.com/LoubnaBenAllal1/status/1655932410566168577
- replit too (amasad tweet only source so far)
- yao fu is exploring this actively https://twitter.com/Francis_YAO_/status/1657985409706762241
- Large Language Models trained on code reason better, even on benchmarks that have nothing to do with code. Some good discussion here about the topic:
- linked to coding -> chain of thought
ccording to the post, Claude 2 now 71.2%, a significant upgrade from 1.3 (56.0%). (Found in model card: pass@1)
For comparison:
-
GPT-4 claims 85.4 on HumanEval, in a recent paper https://arxiv.org/pdf/2303.11366.pdf GPT-4 was tested at 80.1 pass@1 and 91 pass@1 using their Reflexion technique. They also include MBPP and Leetcode Hard benchmark comparisons
-
WizardCoder, a StarCoder fine-tune is one of the top open models, scoring a 57.3 pass@1, model card here: https://huggingface.co/WizardLM/WizardCoder-15B-V1.0
-
The best open model I know of atm is replit-code-instruct-glaive, a replit-code-3b fine tune, which scores a 63.5% pass@1. An independent developer abacaj has reproduced that announcement as part of code-eval, a repo for getting human-eval results: https://github.com/abacaj/code-eval
Those interested in this area may also want to take a look at this repo https://github.com/my-other-github-account/llm-humaneval-ben... that also ranks with Eval+, the CanAiCode Leaderboard https://huggingface.co/spaces/mike-ravkine/can-ai-code-resul... and airate https://github.com/catid/supercharger/tree/main/airate
Also, as with all LLM evals, to be taken with a grain of salt...pull
Liu, Jiawei, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. “Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation.” arXiv, June 12, 2023. https://doi.org/10.48550/arXiv.2305.01210.
- 2010: natural language coding is going to work https://writings.stephenwolfram.com/2010/11/programming-with-natural-language-is-actually-going-to-work/
- Oct 2021: Github Copilot technical preview - team of 6 working on it
- Dec 2021: Github Copilot for businesses
- Feb 2022: February, DeepMind introduced AlphaCode, a transformer pretrained on 86 million programs in 12 programming languages and fine-tuned on entries to coding contests. At inference, it generates a million possible solutions and filters out the bad ones. In this way, it retroactively beat more than half of contestants in 10 coding competitions.
- Apr 2022: https://www.allendowney.com/blog/2023/04/02/llm-assisted-programming/ state of programming
- Jun 2022: Github Copilot GA
- Sep 2022: Github Copilot productivity survey
- Sep 2022: BigCODE https://www.servicenow.com/blogs/2022/bigcode-large-language-models.html
- Oct 2022: The Stack: 3 TB of permissively licensed source code in 30 programming languages https://huggingface.co/datasets/bigcode/the-stack
- Nov 2022: Kite.com public failure https://www.kite.com/blog/product/kite-is-saying-farewell/
- Our diagnosis is that individual developers do not pay for tools. Their manager might, but engineering managers only want to pay for discrete new capabilities, i.e. making their developers 18% faster when writing code did not resonate strongly enough.
- Nov 2022: https://www.codeium.com/blog/beta-launch-announcement
- Dec 2022: reverse engineering copilot https://thakkarparth007.github.io/copilot-explorer/posts/copilot-internals.html#other-random-tidbits
- https://github.com/fauxpilot/fauxpilot This is an attempt to build a locally hosted version of GitHub Copilot. It uses the SalesForce CodeGen models inside of NVIDIA's Triton Inference Server with the FasterTransformer backend.
- Dec 2022: alphacode evaluation https://github.com/norvig/pytudes/blob/main/ipynb/AlphaCode.ipynb
- Jan 2023: Copilot Labs https://marketplace.visualstudio.com/items?itemName=GitHub.copilot-labs
- Feb 2023 https://www.bleepingcomputer.com/news/security/github-copilot-update-stops-ai-model-from-revealing-secrets/ Copilot will introduce a new paradigm called "Fill-In-the-Middle," which uses a library of known code suffixes and leaves a gap for the AI tool to fill, achieving better relevance and coherence with the rest of the project's code. Additionally, GitHub has updated the client of Copilot to reduce unwanted suggestions by 4.5% for improved overall code acceptance rates. "When we first launched GitHub Copilot for Individuals in June 2022, more than 27% of developers’ code files on average were generated by GitHub Copilot," Senior Director of Product Management Shuyin Zhao said.
"Today, GitHub Copilot is behind an average of 46% of a developers’ code across all programming languages—and in Java, that number jumps to 61%."
- March 2023 - more ambitious with small scripts
- https://simonwillison.net/2023/Mar/27/ai-enhanced-development/
- geoffrey litt stuff
- March 2023 - Codium AI - 11m seed - https://www.codium.ai/blog/codiumai-powered-by-testgpt-accounces-beta-and-raised-11m/
- April 2023 - Replit v1 code 3b announced
- May 2023 - Huggingface/ServiceNow Starcoder https://techcrunch.com/2023/05/04/hugging-face-and-servicenow-release-a-free-code-generating-model/?guccounter=1
- June 2023 - phi-1 beats chatgpt at coding with 1.3b parameters, and only 7B tokens for several epochs of pretraining data. 1/7th of that data being synthetically generated :O The rest being extremely high quality textbook data https://twitter.com/Teknium1/status/1671336110684012545?s=20
- aug 2023 - july shanghai newhope model https://twitter.com/mathemagic1an/status/1686814347287486464?s=20
- Ryan Salva on how Copilot works + how to gain developer trust https://news.ycombinator.com/item?id=33226515
- https://medium.com/@enoch3712/github-copilot-is-under-the-hood-how-it-works-and-getting-the-best-out-of-it-4699d4dc3cd8
- cushman - 2048 tokens
- davinci - 4k tokens
- vulnerabilities https://www.spiceworks.com/it-security/security-general/news/40-of-code-produced-by-github-copilot-vulnerable-to-threats-research/
- codex-davinci-002 Do Users Write More Insecure Code with AI Assistants some vulns found in C code with 75 participants - media report
- codex-cushman-001 https://arxiv.org/abs/2208.09727
- Github Copilot investigation https://news.ycombinator.com/item?id=33240341
- Readers write more insecure code https://arxiv.org/abs/2211.03622 https://info.deeplearning.ai/generated-code-makes-overconfident-programmers-chinas-autonomous-drone-carrier-does-bot-therapy-require-informed-consent-mining-for-green-tech-1
- bloom bigcode https://www.servicenow.com/blogs/2022/bigcode-large-language-models.html
- salesforce codegen
- Nijkamp, E., Pang, B., Hayashi, H., Tu, L., Wang, H., Zhou, Y., Savarese, S., and Xiong, C. (2022). Codegen: An open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474.
- Nijkamp, E., Hayashi, H., Xiong, C., Savarese, S., and Zhou, Y. (2023). Codegen2: Lessons for training llms on programming and natural languages. arXiv preprint arXiv:2305.02309.
- Codegen 2.5
- the stack from eleuther
- Li, R., Allal, L. B., Zi, Y., Muennighoff, N., Kocetkov, D., Mou, C., Marone, M., Akiki, C., Li, J., Chim, J., et al. (2023). Starcoder: may the source be with you! arXiv preprint arXiv:2305.06161.
- https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-base
https://arxiv.org/pdf/2303.06689.pdf MBPP [Austin et al., 2021] This benchmark, referred to as "Mostly Basic Programming Problems", contains nearly 1000 crowd-sourced python programming problems, covering programming fundamentals, standard library functionality, and more. Each problem in the benchmark consists of a NL description, a code solution, and 3 automated test cases. A portion of the manually verified data is extracted as "MBPP-sanitized". For MBPP, which does not include function signatures, only the NL description is provided as input.
HumanEval [Chen et al., 2021] This benchmark is a set of 164 handwritten programming problems, proposed by OpenAI. Each problem includes a function signature, NL description, function body, and several unit tests, with an average of 7.7 tests per problem. For HumanEval, function signature, NL description, and public test cases are provided as input. Furthermore, we utilize an expanded version of MBPP and HumanEval , which includes over 100 additional test cases per task, to reinforce the validity of code evaluation [Dong et al., 2023]. This extended version is referred to as MBPP-ET and HumanEval-ET.
bigcode eval harness https://github.com/bigcode-project/bigcode-evaluation-harness/
(alessio's blogpost https://evcrevolution.com/p/evc-10-llm-for-developers)
sourcegraph list https://github.com/sourcegraph/awesome-code-ai
- tensai refactor pr codegen https://twitter.com/mathemagic1an/status/1610023513334878208?s=46&t=HZzqUlCKP3qldVBoBwEzZg
- Magic https://techcrunch.com/2023/02/06/magic-dev-code-generating-startup-raises-23m/
- unmaintained
- Code IDEs
- Introducing Cursor!! (https://cursor.so)Cursor IDE https://twitter.com/amanrsanger/status/1615539968772050946
- why is this not a vscode extension?
- https://idx.dev/ Project IDX is an entirely web-based workspace for full-stack application development, complete with the latest generative AI (powered by Codey and PaLM 2), and full-fidelity app previews
- E2b - from vasek https://github.com/e2b-dev/e2b
- Introducing Cursor!! (https://cursor.so)Cursor IDE https://twitter.com/amanrsanger/status/1615539968772050946
- the pandas extension thing - https://github.com/approximatelabs/sketch
- built on lambdaprompt https://github.com/approximatelabs/lambdaprompt
- pandas dataframe chat https://github.com/gventuri/pandas-ai
- prefectio marvin ai
- custom languages
- LMQL
- https://github.com/georgia-tech-db/eva EVA DB is an AI-SQL database system for developing applications powered by AI models. We aim to simplify the development and deployment of AI-powered applications that operate on structured (tables, feature stores) and unstructured data (videos, text, podcasts, PDFs, etc.). EVA DB accelerates AI pipelines by 10-100x using a collection of performance optimizations inspired by time-tested SQL database systems, including data-parallel query execution, function caching, sampling, and cost-based predicate reordering. EVA supports an AI-oriented SQL-like query language tailored for analyzing both structured and unstructured data. It has first-class support for PyTorch, Hugging Face, YOLO, and Open AI models.
- https://github.com/alantech/marsha LLM-based programming language. Describe what you want done with a simple syntax, provide examples of usage, and the Marsha compiler will guide an LLM to produce tested Python software.
- copilot labs
- http://www.useadrenaline.com/ Show HN: Fully LLM powered code repair – fix and explain your code in seconds
- Gptcommit: Never write a commit message again (with the help of GPT-3)
- yet another https://news.ycombinator.com/item?id=34591733
- https://github.com/Nutlope/aicommits - or chadCommit inside vscode
- https://github.com/di-sukharev/opencommit
- https://github.com/paul-gauthier/aider
- vscode extensions
- santacoder typosaurus https://twitter.com/corbtt/status/1616270918774575105?s=46&t=ZSeI0ovGBee8JBeXEe20Mg semantic linter that spots errors in code
- GPT Prompt Engineer https://github.com/mshumer/gpt-prompt-engineer
- Buildt - AI-powered search allows you to find code by searching for what it does, not just what it is.
- https://www.grit.io/
- codegen ai
- Continue.dev VSCode downloads ~15K, Rift ~2,100
- morph labs rift
- qqbot - dan robinson?
- YC
- code generation - second.dev https://news.ycombinator.com/item?id=35083093
- Pygma is used to convert Figma mockups into production-ready code.
- code search
- Phind https://news.ycombinator.com/item?id=35543668
- bloop - AI code search https://news.ycombinator.com/item?id=34892541
- private code search w animation
- https://news.ycombinator.com/item?id=36260961
- sourcegraph cody
- buildt stackoverflow.gg https://twitter.com/bentossell/status/1622513022781587456
- What comes after Copilot? My take: a conversation with your codebase! Introducing Tensai, your repo-level code assistant http://TensaiCode.com - jay hacks
- Tabby - Self Hosted GitHub Copilot https://news.ycombinator.com/item?id=35470915
- codecomplete - ycw23 - copilot for enterprise https://news.ycombinator.com/item?id=35152851
- CodeComplete offers an experience similar to Copilot; we serve AI code completions as developers type in their IDEs. However, instead of sending private code snippets to GitHub or OpenAI, we use a self-hosted LLM to serve code completions. Another advantage with self-hosting is that it’s more straightforward to securely fine-tune to the company’s codebase. Copilot suggestions aren’t always tailored to a company’s coding patterns or internal libraries, so this can help make our completions more relevant and avoid adding tech debt.
- anysphere control.dev - an AI code editor that harnesses the power of GPT-4. It’s a drop-in replacement for VS Code, has context about your closed-source codebase, and it will make you 2x more productive tomorrow.
- socket.dev ai security scanning https://socket.dev/blog/introducing-socket-ai-chatgpt-powered-threat-analysis
- agent writing its own code in a loop https://github.com/pHaeusler/micro-agent
- https://www.grit.io/
- https://twitter.com/MrHunterBrooks/status/1639373651010109442?s=20
- https://github.com/gitstart
- AutoPR, a Github Action that autonomously writes a pull request in response to an issue https://twitter.com/IrgolicR/status/1652451501015457798
- code generation
- codegen.ai
- https://github.com/paul-gauthier/aider
- Sweep.dev https://news.ycombinator.com/item?id=36987454
- https://github.com/di-sukharev/opencommit
- ai-commit
- ai CLI from builderio https://github.com/BuilderIO/ai-shell
Codium - https://www.codium.ai/blog/codiumai-powered-by-testgpt-accounces-beta-and-raised-11m/ - video demo https://twitter.com/mathemagic1an/status/1638598693623582720
- https://github.com/jbilcke/latent-browser hallucinate by MIME types
- https://github.com/TheAppleTucker/backend-GPT backend is all you need
- https://withsutro.com/ text to app
- https://github.com/jbrukh/gpt-jargon pseudolanguage
- https://github.com/eth-sri/lmql
- https://github.com/microsoft/guidance/
-
Python/pydantic https://twitter.com/AAAzzam/status/1671608335001370625
- The size of all code/history on Github public repos is 92TB The size of Google's monorepo in 2015 was 86TB (of much higher quality code) If Google were willing to deploy code models trained on their own data, they'd have a noticable advantage over everyone else. https://twitter.com/amanrsanger/status/1656696500339249153
- https://arxiv.org/pdf/2303.06689.pdf importance of planning in codegen
- maybe use tree of thoughts
- CLI https://twitter.com/SpellcraftAI/status/1593393643305459712