Difference between revisions of "Free Software Directory:Artificial Intelligence Team"
GrahamxReed (talk | contribs) m (removing "license" from a few entries on tables due to it being implied by the header) |
GrahamxReed (talk | contribs) m (adding stable-diffusion.cpp and descriptions for text-to-speech) |
||
Line 232: | Line 232: | ||
| [https://github.com/huggingface/diffusers/blob/main/LICENSE Apache-2.0] | | [https://github.com/huggingface/diffusers/blob/main/LICENSE Apache-2.0] | ||
| State-of-the-art diffusion models for image and audio generation in PyTorch | | State-of-the-art diffusion models for image and audio generation in PyTorch | ||
+ | |- | ||
+ | ! [https://github.com/leejet/stable-diffusion.cpp stable-diffusion.cpp] | ||
+ | | leejet | ||
+ | | [https://github.com/leejet/stable-diffusion.cpp?tab=MIT-1-ov-file#readme MIT] | ||
+ | | Stable Diffusion in pure C/C++ | ||
|- | |- | ||
|} | |} | ||
Line 319: | Line 324: | ||
! Credit | ! Credit | ||
! License | ! License | ||
+ | ! Description | ||
|- | |- | ||
! [https://github.com/suno-ai/bark Bark] | ! [https://github.com/suno-ai/bark Bark] | ||
| Suno AI | | Suno AI | ||
| [https://github.com/suno-ai/bark/blob/main/LICENSE MIT] | | [https://github.com/suno-ai/bark/blob/main/LICENSE MIT] | ||
+ | | Text-Prompted Generative Audio Model | ||
|- | |- | ||
! [https://github.com/coqui-ai/TTS Coqui TTS] | ! [https://github.com/coqui-ai/TTS Coqui TTS] | ||
| Coqui AI | | Coqui AI | ||
| [https://github.com/coqui-ai/TTS/blob/dev/LICENSE.txt MPL 2.0] | | [https://github.com/coqui-ai/TTS/blob/dev/LICENSE.txt MPL 2.0] | ||
+ | | 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production | ||
|- | |- | ||
! [https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI Retrieval-based-Voice-Conversion-WebUI] | ! [https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI Retrieval-based-Voice-Conversion-WebUI] | ||
| RVC-Project | | RVC-Project | ||
| [https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI#MIT-1-ov-file MIT] | | [https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI#MIT-1-ov-file MIT] | ||
+ | | Voice data <= 10 mins can also be used to train a good VC model! | ||
|- | |- | ||
! [https://github.com/voicepaw/so-vits-svc-fork SoftVC VITS Singing Voice Conversion Fork] | ! [https://github.com/voicepaw/so-vits-svc-fork SoftVC VITS Singing Voice Conversion Fork] | ||
| voicepaw | | voicepaw | ||
| [https://github.com/voicepaw/so-vits-svc-fork/blob/main/LICENSE Apache 2.0 & MIT] | | [https://github.com/voicepaw/so-vits-svc-fork/blob/main/LICENSE Apache 2.0 & MIT] | ||
+ | | so-vits-svc fork with realtime support, improved interface and more features. | ||
|- | |- | ||
! [https://github.com/neonbjb/tortoise-tts TorToiSe] | ! [https://github.com/neonbjb/tortoise-tts TorToiSe] | ||
| neonbjb (James Betker) | | neonbjb (James Betker) | ||
| [https://github.com/neonbjb/tortoise-tts/blob/main/LICENSE Apache 2.0] | | [https://github.com/neonbjb/tortoise-tts/blob/main/LICENSE Apache 2.0] | ||
+ | | A multi-voice TTS system trained with an emphasis on quality | ||
|- | |- | ||
|} | |} |
Revision as of 21:05, 4 March 2024
The Artificial Intelligence Project Team gathers free software resources regarding machine learning / artificial intelligence.
Group info | User info | |||||
---|---|---|---|---|---|---|
User | Role | Reference | Real name | libera.chat nick | Time zone | Title |
David_Hedlund | Coordinator | David Hedlund | David_Hedlund | Europe/Stockholm | ||
GrahamxReed | Collaborator | Graham Reed | Graham_Reed | America/New_York | ||
Mertgor | Observer | Mert Gör | hwpplayer1 | Europe/Istanbul | ||
Mmcmahon | Team captain | Michael McMahon | thomzane | America/New_York | FSF Systems Administrator |
Truthfulness
- US NIST AI Risk Management Framework Playbook
- A science paper benchmark: TruthfulQA: Measuring How Models Mimic Human Falsehoods
Transformers
- Package manager and player: Transformers - https://github.com/huggingface/transformers/
- Original science paper: Attention Is All You Need
Text
Grammar
Translation
Text generation
Front ends
Project | Credit | License | Description |
---|---|---|---|
Agnai | agnaistic | AGPLv3 | AI agnostic (multi-user and multi-bot) chat with fictional characters |
ExLlamaV2 | turboderp | MIT | A fast inference library for running LLMs locally on modern consumer-class GPUs |
ExLlamaV2 WebUI | turboderp | MIT | |
GPT4All | Nomic AI | MIT | Run open-source LLMs anywhere |
KoboldAI | KoboldAI | AGPLv3 | A browser-based front-end for AI-assisted writing with multiple local & remote AI models |
koboldcpp | LostRuins | AGPLv3 | A simple one-file way to run various GGML and GGUF models with KoboldAI's UI |
Serge | serge-chat | MIT | A web interface for chatting with Alpaca through llama.cpp. Fully dockerized, with an easy to use API. |
SillyTavern | SillyTavern | AGPLv3 | LLM frontend for power users |
Text Generation Web UI | oobabooga | AGPLv3 | A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), LLaMA models |
Models
Project | Credit | License | Description |
---|---|---|---|
llama.cpp | ggerganov (Georgi Gerganov) | MIT | Port of Facebook's LLaMA model in C/C++ |
Mistral | Mistral AI | Apache 2.0 | Mistral 7B significantly outperforms Llama 2 13B on all metrics, and is on par with Llama 34B. |
Mixtral | Mistral AI | Apache 2.0 | Mixtral outperforms Llama 2 70B on most benchmarks with 6x faster inference. |
Other
Project | Credit | License | Description |
---|---|---|---|
GPT-NeoX | EleutherAI | Apache 2.0 | EleutherAI's library for training large-scale language models on GPUs |
FastChat | lm-sys | Apache-2.0 | An open platform for training, serving, and evaluating large language models |
Open Assistant | LAION-AI | Apache 2.0 | A chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so |
Open Interpreter | KillianLucas | AGPLv3 | Lets LLMs run code (Python, Javascript, Shell, and more) locally |
Honorary mention: LLaMA 1 had an AGPLv3 license.
Code generation
This concept is controversial. See the FSF's other writing on this topic.
- CodeGen by Salesforce FSD | BSD 3-Clause "New" or "Revised" License
- TurboPilot by ravenscroftj (James Ravenscroft) FSD | BSD 3-Clause "New" or "Revised" License
- CodeGen2 by Salesforce FSD | Apache-2.0
Images
Image generation
Text to image GUI
Project | Credit | License | Description |
---|---|---|---|
ComfyUI | comfyanonymous | GPLv3 | Modular Stable Diffusion GUI, API, and backend with a graph/nodes interface |
ControlNet for Stable Diffusion WebUI | Mikubill (Kakigōri Maker) | GPLv3 | An AUTOMATIC1111 extension that adds ControlNet to the original Stable Diffusion model to generate images |
Stable Diffusion WebUI | AUTOMATIC1111 | AGPLv3 | Stable Diffusion web UI |
Additional libraries
Project | Credit | License | Description |
---|---|---|---|
ControlNet | lllyasviel | Apache-2.0 | Adding conditional control to text-to-image diffusion models |
Diffusers | huggingface | Apache-2.0 | State-of-the-art diffusion models for image and audio generation in PyTorch |
stable-diffusion.cpp | leejet | MIT | Stable Diffusion in pure C/C++ |
Legacy
Project | Credit | License | Description |
---|---|---|---|
DALL-E Mini | borisdayma (Boris Dayma) | Apache 2.0 | Generate images from a text prompt |
neural-style | anishathalye | GPLv3 | An implementation of neural style in TensorFlow |
Videos
Image generation techniques create pictures from noise estimations. This noise shows up as artifacts and hampers temporal stability for objects. These projects tackle that issue.
Project | Credit | License | Description |
---|---|---|---|
FILM: Frame Interpolation for Large Motion | Google Research | Apache 2.0 | A unified single-network approach to frame interpolation that doesn't use additional pre-trained networks |
MagicAnimate | MagIC Research | BSD-3-Clause | Temporally consistent human image animation using a diffusion model |
MotionCtrl | ARC Lab, Tencent PCG | Apache 2.0 | A unified and flexible motion controller for video generation |
TemporalKit | CiaraStrawberry (Ciara Rowles) | GPLv3 | An all in one solution for adding temporal stability to a Stable Diffusion render via an AUTOMATIC1111 extension |
Text To Video Synthesis Colab | camenduru | The Unlicense | A text-to-video synthesis model that evolves from a text-to-image synthesis model |
Thin-Plate Spline Motion Model for Image Animation | yoyo-nb | MIT | Animates a static object in a source image according to a driving video |
- EfficientViT by MIT HAN Lab | Apache 2.0
- LLaVA by Haotian Liu | Apache 2.0
- MiniGPT-4 by Vision-CAIR | BSD 3-Clause "New" or "Revised" License & BSD 3-Clause License
3D modeling
- CLIP-MESH by NasirKhalid24 | MIT
- threestudio | Apache 2.0
Audio
Natural language processing (NLP)
Transcription (Speech to text (STT))
- Vosk
- Whisper
- OpenAI's Whisper model ported to C/C++ by ggerganov (Georgi Gerganov) | MIT License
Speech synthesis (text to speech (TTS))
Project | Credit | License | Description |
---|---|---|---|
Bark | Suno AI | MIT | Text-Prompted Generative Audio Model |
Coqui TTS | Coqui AI | MPL 2.0 | 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production |
Retrieval-based-Voice-Conversion-WebUI | RVC-Project | MIT | Voice data <= 10 mins can also be used to train a good VC model! |
SoftVC VITS Singing Voice Conversion Fork | voicepaw | Apache 2.0 & MIT | so-vits-svc fork with realtime support, improved interface and more features. |
TorToiSe | neonbjb (James Betker) | Apache 2.0 | A multi-voice TTS system trained with an emphasis on quality |
Music
- Moseca separates music tracks into different stems (voice, drums, bass, guitar, piano, and others). This is useful for remixing, karaoke, and music studies. MIT
Uncategorized
- Virtual assistant: Mycroft
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the page “GNU Free Documentation License”.
The copyright and license notices on this page only apply to the text on this page. Any software or copyright-licenses or other similar notices described in this text has its own copyright notice and license, which can usually be found in the distribution or license text itself.