Difference between revisions of "Free Software Directory talk:Artificial Intelligence Team"
GrahamxReed (talk | contribs) m (stability AI note) |
GrahamxReed (talk | contribs) m (→Text: Simple Bench) |
||
(33 intermediate revisions by 2 users not shown) | |||
Line 14: | Line 14: | ||
== Potential Freedom issues == | == Potential Freedom issues == | ||
− | * Dependencies need to be checked. | + | * Dependencies need to be checked. (e.g. [https://docs.librechat.ai/ LibreChat]) |
* Verify whether a workflow requires non-free GPU or if CPU can be used. | * Verify whether a workflow requires non-free GPU or if CPU can be used. | ||
* The training data often contains non-free licensed material. | * The training data often contains non-free licensed material. | ||
** According to current copyright laws, this does not impact the license of the model or the output of the model. According to current copyright laws, the output is public domain. [[User:Mmcmahon|Mmcmahon]] ([[User talk:Mmcmahon|talk]]) 11:48, 2 May 2023 (EDT) | ** According to current copyright laws, this does not impact the license of the model or the output of the model. According to current copyright laws, the output is public domain. [[User:Mmcmahon|Mmcmahon]] ([[User talk:Mmcmahon|talk]]) 11:48, 2 May 2023 (EDT) | ||
− | [https:// | + | === Legal stuffs === |
+ | * [https://documents.un.org/doc/undoc/ltd/n24/065/92/pdf/n2406592.pdf?token=cHeoDbvIBOFdwcPVZ6&fe=true UN Resolution: Seizing the opportunities of safe, secure and trustworthy artificial intelligence systems for sustainable development] | ||
− | + | ==== USA ==== | |
− | + | * [https://www.whitehouse.gov/briefing-room/statements-releases/2023/10/30/fact-sheet-president-biden-issues-executive-order-on-safe-secure-and-trustworthy-artificial-intelligence/ FACT SHEET: President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence] | |
− | + | * [https://www.whitehouse.gov/ostp/ai-bill-of-rights/ Blueprint for an AI Bill of Rights] | |
− | |||
− | - | ||
− | + | === Model licenses === | |
There appears to be a swath of custom model licenses being used independent of the more standardized software licenses used to interact with models. This presents a conflict as to what license is deemed applicable to the files contained in any repo. | There appears to be a swath of custom model licenses being used independent of the more standardized software licenses used to interact with models. This presents a conflict as to what license is deemed applicable to the files contained in any repo. | ||
Line 36: | Line 35: | ||
Worth noting: [https://github.com/openlm-research/open_llama Open LLaMA (out-of-date)] is an example of removing the issue of [https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md model cards] for text generation. | Worth noting: [https://github.com/openlm-research/open_llama Open LLaMA (out-of-date)] is an example of removing the issue of [https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md model cards] for text generation. | ||
− | StabilityAI [https://github.com/Stability-AI/stablediffusion/blob/main/LICENSE-MODEL keeps (Nov '22)] [https://github.com/Stability-AI/generative-models/blob/main/model_licenses/LICENSE-SDXL1.0 updating (July '23)] [https://github.com/Stability-AI/generative-models/blob/main/model_licenses/LICENSE-SDXL-Turbo its licenses (Nov '23)], so I've removed them from the main page until they settle down. | + | StabilityAI [https://github.com/Stability-AI/stablediffusion/blob/main/LICENSE-MODEL keeps (Nov '22)] [https://github.com/Stability-AI/generative-models/blob/main/model_licenses/LICENSE-SDXL1.0 updating (July '23)] [https://github.com/Stability-AI/generative-models/blob/main/model_licenses/LICENSE-SDXL-Turbo its licenses (Nov '23)], so I've removed them from the main page until they settle down. |
+ | |||
+ | [https://github.com/gligen/GLIGEN GLIGEN] and [https://github.com/mut-ex/gligen-gui GLIGEN GUI] is quite neat, but states strict model terms and conditions associated with using it. | ||
== Testing model viability == | == Testing model viability == | ||
Tools are needed to assess the pros/cons of each model. | Tools are needed to assess the pros/cons of each model. | ||
+ | |||
+ | === Meta-Leaderboards === | ||
+ | * [https://huggingface.co/spaces/leaderboards/LeaderboardFinder HuggingFace LeaderboardFinder] | ||
+ | * [https://llm.extractum.io/ LLM Explorer] | ||
+ | |||
=== Leaderboards === | === Leaderboards === | ||
Due to the issue of merely training a model to become good at whatever tests are on a leaderboard, multiple leaderboards are preferential (hence not putting HuggingFace on the main page). A more comprehensive evaluation would be a meta-analysis of existing leaderboards. | Due to the issue of merely training a model to become good at whatever tests are on a leaderboard, multiple leaderboards are preferential (hence not putting HuggingFace on the main page). A more comprehensive evaluation would be a meta-analysis of existing leaderboards. | ||
==== Text ==== | ==== Text ==== | ||
− | |||
− | * [https://tatsu-lab.github.io/alpaca_eval/ AlpacaEval Leaderboard] | + | * [https://tatsu-lab.github.io/alpaca_eval/ AlpacaEval] |
+ | |||
+ | * [https://rentry.org/ALLMRR Another LLM Roleplay Rankings] | ||
+ | |||
+ | * [https://ayumi.m8geil.de/erp4_chatlogs/#!/index Ayumi Benchmark ERPv4] | ||
+ | |||
+ | * [https://ai.azure.com/explore/benchmarks Azure AI Studio Model benchmarks] | ||
+ | |||
+ | * [https://console.chaiverse.com/ Chaiverse] | ||
+ | |||
+ | * [https://eqbench.com/index.html EQ-Bench (Emotional Intelligence Benchmark)] | ||
+ | |||
+ | * [https://eqbench.com/creative_writing.html EQ-Bench (Creative Writing Benchmark)] | ||
+ | |||
+ | * [https://lifearchitect.ai/models-table Googlesheet of models, AI labs, datasets, and various other ML info by Alan Thompson] | ||
+ | |||
+ | * [https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard HuggingFace Open LLM] | ||
+ | |||
+ | * [https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard HuggingFace Uncensored General Intelligence] | ||
+ | |||
+ | * [https://github.com/kagisearch/llm-chess-puzzles Kagi Search's llm chess puzzles] | ||
+ | |||
+ | * [https://livebench.ai/ LiveBench: A Challenging, Contamination-Free LLM Benchmark] | ||
− | * [https://chat.lmsys.org/?leaderboard | + | * [https://chat.lmsys.org/?leaderboard LMSYS Chatbot Arena] |
− | * [https://github.com/ | + | * [https://mixeval.github.io/#leaderboard MixEval] |
+ | |||
+ | * [https://rank.opencompass.org.cn/home OpenCompass (China)] | ||
+ | |||
+ | * [https://openrouter.ai/rankings OpenRouter LLM Rankings] | ||
+ | |||
+ | * [https://predibase.com/fine-tuning-index Predibase's Open-source Model Fine-Tuning Leaderboard] | ||
+ | |||
+ | * [https://scale.com/leaderboard SEAL leaderboards] | ||
+ | |||
+ | * [https://simple-bench.com/index.html Simple Bench - Basic Reasoning] | ||
+ | |||
+ | * [https://prollm.toqan.ai/leaderboard Toqan's ProLLM Benchmarks: Summarization] | ||
+ | |||
+ | * [https://prollm.toqan.ai/leaderboard Toqan's ProLLM Benchmarks: Q&A Assistant] | ||
+ | |||
+ | ===== Coding ===== | ||
+ | |||
+ | * [https://aider.chat/docs/leaderboards/ Aider] | ||
+ | |||
+ | * [https://fudanselab-classeval.github.io/leaderboard.html ClassEval] | ||
+ | |||
+ | * [https://crux-eval.github.io/leaderboard.html CRUXEval] | ||
− | * [https:// | + | * [https://evalplus.github.io/leaderboard.html EvalPlus] |
− | * [https:// | + | * [https://livebench.ai/ LiveBench: A Challenging, Contamination-Free LLM Benchmark] |
− | * [https:// | + | * [https://livecodebench.github.io/leaderboard.html LiveCodeBench] |
− | * [https:// | + | * [https://scale.com/leaderboard/coding SEAL leaderboards' coding prompt set] |
− | * [https:// | + | * [https://leaderboard.tabbyml.com/ TabbyML Coding LLMs] |
− | * [https:// | + | * [https://prollm.toqan.ai/leaderboard Toqan's ProLLM Benchmarks: Coding Assistant] |
==== Voice ==== | ==== Voice ==== | ||
+ | * [https://huggingface.co/spaces/TTS-AGI/TTS-Arena Huggingface TTS Arena: Benchmarking TTS Models in the Wild] | ||
* [https://github.com/Vaibhavs10/open-tts-tracker Open TTS Tracker] | * [https://github.com/Vaibhavs10/open-tts-tracker Open TTS Tracker] | ||
+ | |||
+ | ==== Benchmarks ==== | ||
+ | * [https://arxiv.org/abs/2109.07958 TruthfulQA: Measuring How Models Mimic Human Falsehoods] | ||
+ | * [https://github.com/premAI-io/benchmarks PremAI's inference engine benchmarks] Hint: top ones are Nvidia's TensorRT, exllamav2, vllm, llama.cpp | ||
+ | |||
+ | ==== Architectures ==== | ||
+ | * Transformers | ||
+ | ** Package manager and player: [[Transformers]] - https://github.com/huggingface/transformers/ | ||
+ | ** Original science paper: [https://arxiv.org/abs/1706.03762 Attention Is All You Need] | ||
+ | |||
+ | * Hyena | ||
+ | |||
+ | * Mamba | ||
==== Ordinal value scales could exist for ==== | ==== Ordinal value scales could exist for ==== | ||
Line 82: | Line 145: | ||
* math | * math | ||
* creative problem solving (there exists methodology for testing this in humans) | * creative problem solving (there exists methodology for testing this in humans) | ||
+ | "The present findings suggest that the current state of AI language models demonstrate higher creative potential than human respondents." [https://doi.org/10.1038/s41598-024-53303-w Nature (Feb 10, '24)] | ||
+ | |||
+ | ===== Computation Costs ===== | ||
+ | * [https://epochai.org/data/epochdb/table?show=compute-intensive Computation costs and datasets for many models] | ||
+ | * [https://arxiv.org/pdf/2404.07413 LLaMA 2 performance reached with $100k] | ||
+ | * [https://arxiv.org/abs/2407.15811 Image Stable Diffusion performance reached with $2k] | ||
=== General trends === | === General trends === | ||
Line 102: | Line 171: | ||
== Large Language Models == | == Large Language Models == | ||
− | ===Censorship issues=== | + | === Censorship issues === |
[https://erichartford.com/uncensored-models A guide to decensoring models]; I would exercise caution, as it stands to reason an inherently uncensored model would perform better than needing the legwork of decensoring one (and then making mistakes + missing some of the censorship) | [https://erichartford.com/uncensored-models A guide to decensoring models]; I would exercise caution, as it stands to reason an inherently uncensored model would perform better than needing the legwork of decensoring one (and then making mistakes + missing some of the censorship) | ||
+ | |||
+ | ==== Examples ==== | ||
+ | * [https://www.reddit.com/r/LocalLLaMA/comments/1ctiggk/if_you_ask_deepseekv2_through_the_official_site/ If you ask Deepseek-V2 (through the official site) 'What happened at Tienanmen square?', it deletes your question and clears the context.] | ||
+ | |||
+ | == Isolating a web service like Gradio from internet access == | ||
+ | |||
+ | Seems useful to ensure less snooping on local models - [https://rentry.org/IsolatedLinuxWebService Linux Safe Web Service] | ||
+ | |||
+ | == Legacy notable projects == | ||
* [https://huggingface.co/reeducator/vicuna-13b-free Vicuna 13B] - It appears as though this model is inherently censored [https://github.com/lm-sys/FastChat/issues/89] | * [https://huggingface.co/reeducator/vicuna-13b-free Vicuna 13B] - It appears as though this model is inherently censored [https://github.com/lm-sys/FastChat/issues/89] | ||
− | |||
− | |||
* [https://huggingface.co/chavinlo/alpaca-13b Alpaca 13B] | * [https://huggingface.co/chavinlo/alpaca-13b Alpaca 13B] | ||
* [https://huggingface.co/PygmalionAI/pygmalion-6b Pygmalion 6B] | * [https://huggingface.co/PygmalionAI/pygmalion-6b Pygmalion 6B] | ||
+ | |||
+ | * [https://github.com/salesforce/CodeGen CodeGen] by Salesforce [[CodeGen|FSD]] | [https://github.com/salesforce/CodeGen/blob/main/LICENSE.txt BSD 3-Clause "New" or "Revised" License] | ||
+ | * [https://github.com/salesforce/CodeGen2 CodeGen2] by Salesforce [[CodeGen2|FSD]] | [https://github.com/salesforce/CodeGen2/blob/main/LICENSE Apache-2.0] | ||
+ | |||
+ | {| class="wikitable sortable" | ||
+ | |- | ||
+ | ! Project | ||
+ | ! Credit | ||
+ | ! License | ||
+ | ! Description | ||
+ | |- | ||
+ | ! [https://github.com/borisdayma/dalle-mini/ DALL-E Mini] | ||
+ | | borisdayma (Boris Dayma) | ||
+ | | [https://github.com/borisdayma/dalle-mini/blob/main/LICENSE Apache 2.0] | ||
+ | | Generate images from a text prompt | ||
+ | |- | ||
+ | ! [https://github.com/anishathalye/neural-style neural-style] | ||
+ | | anishathalye | ||
+ | | [https://github.com/anishathalye/neural-style/blob/master/LICENSE.txt GPLv3] | ||
+ | | An implementation of neural style in TensorFlow | ||
+ | |- | ||
+ | |} | ||
+ | |||
+ | * [https://github.com/FranxYao/chain-of-thought-hub#results Yao Fu's (FranxYao) chain-of-thought hub] | ||
+ | |||
+ | * [https://infi-coder.github.io/inficoder-eval/ InfiCoder-Eval] | ||
==External links== | ==External links== | ||
* [https://en.wikipedia.org/wiki/Applications_of_artificial_intelligence Applications of artificial intelligence (Wikipedia)] | * [https://en.wikipedia.org/wiki/Applications_of_artificial_intelligence Applications of artificial intelligence (Wikipedia)] |
Latest revision as of 00:00, 3 September 2024
Contents
Free software replacements that are missing
- AI Research Assistant
- https://elicit.org/ - Elicit uses language models to help you automate research workflows, like parts of literature review.
- Voice to instrument: Tone Transfer-like
- Identification
- Photo
- Pl@ntNet for Android - Pl@ntNet is a citizen science project for automatic plant identification through photographs and based on machine learning. "The observations shared by the community are published with the associated images under a Creative Common CC-BY-SA license (visible author name)." - https://plantnet.org/en/2020/08/06/your-plntnet-data-integrated-into-gbif/
- Audio
- Shazam: Shazam is an application that can identify music, movies, advertising, and television shows, based on a short sample played and using the microphone on the device.
- A Shazam-like software that is identifying genres instead of songs.
- A free app that functions like midomi.com -- "You can find songs with midomi and your own voice. Forgot the name of a song? Heard a bit of one on the radio? All you need is your computer's microphone."
- Photo
- http://design.rxnfinder.org/addictedchem/prediction/
Potential Freedom issues
- Dependencies need to be checked. (e.g. LibreChat)
- Verify whether a workflow requires non-free GPU or if CPU can be used.
- The training data often contains non-free licensed material.
Legal stuffs
USA
- FACT SHEET: President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence
- Blueprint for an AI Bill of Rights
Model licenses
There appears to be a swath of custom model licenses being used independent of the more standardized software licenses used to interact with models. This presents a conflict as to what license is deemed applicable to the files contained in any repo.
Reddit - Security PSA: huggingface models are code. not just data.
This video (starting at 16:50) illustrates a good argument that model checkpoints may not fall under copyright protection so traditional software licenses that depend on copyright law would be invalid. The video does illustrate that contract law may try to be used it place of copyright. I would advise not using YouTube directly and instead using yt-dl or Invidious.
Worth noting: Open LLaMA (out-of-date) is an example of removing the issue of model cards for text generation.
StabilityAI keeps (Nov '22) updating (July '23) its licenses (Nov '23), so I've removed them from the main page until they settle down.
GLIGEN and GLIGEN GUI is quite neat, but states strict model terms and conditions associated with using it.
Testing model viability
Tools are needed to assess the pros/cons of each model.
Meta-Leaderboards
Leaderboards
Due to the issue of merely training a model to become good at whatever tests are on a leaderboard, multiple leaderboards are preferential (hence not putting HuggingFace on the main page). A more comprehensive evaluation would be a meta-analysis of existing leaderboards.
Text
Coding
Voice
Benchmarks
- TruthfulQA: Measuring How Models Mimic Human Falsehoods
- PremAI's inference engine benchmarks Hint: top ones are Nvidia's TensorRT, exllamav2, vllm, llama.cpp
Architectures
- Transformers
- Package manager and player: Transformers - https://github.com/huggingface/transformers/
- Original science paper: Attention Is All You Need
- Hyena
- Mamba
Ordinal value scales could exist for
- indistinguishability from human creations - e.g. Human or Not? social turing chat game
- inference speed
- length of memory
- trivia accurateness
- computation costs: cpu/vram/ram mhz
Source of model training data
- amount of data
- date range (e.g. distinguishing old science from new science for smaller scale models)
- level of censorship (important to make personal+research use distinct from business use)
Problem solving
- math
- creative problem solving (there exists methodology for testing this in humans)
"The present findings suggest that the current state of AI language models demonstrate higher creative potential than human respondents." Nature (Feb 10, '24)
Computation Costs
- Computation costs and datasets for many models
- LLaMA 2 performance reached with $100k
- Image Stable Diffusion performance reached with $2k
General trends
- Larger models are more prone to human superstition[1], but also generate more human-like readability.
- Quantization (a la GPT-Q) allows consumer hardware to run large models.
Stable Diffusion
Stable Diffusion model files (.ckpt) are released under a non-free license.
Here's the stable diffusion beginning point: https://huggingface.co/CompVis/stable-diffusion-v1-4 https://huggingface.co/spaces/CompVis/stable-diffusion-license
stable-diffusion-webui
- https://github.com/AUTOMATIC1111/stable-diffusion-webui
- Demo, and guide - https://www.youtube.com/watch?v=R52hxnpNews
- Depends on stable-diffusion repository instead of diffusers. The current branch of stable-diffusion only has a model license which does not make sense for code. Older commits are MIT.
- Extensions:
Large Language Models
Censorship issues
A guide to decensoring models; I would exercise caution, as it stands to reason an inherently uncensored model would perform better than needing the legwork of decensoring one (and then making mistakes + missing some of the censorship)
Examples
Isolating a web service like Gradio from internet access
Seems useful to ensure less snooping on local models - Linux Safe Web Service
Legacy notable projects
- Vicuna 13B - It appears as though this model is inherently censored [2]
- Alpaca 13B
- Pygmalion 6B
- CodeGen by Salesforce FSD | BSD 3-Clause "New" or "Revised" License
- CodeGen2 by Salesforce FSD | Apache-2.0
Project | Credit | License | Description |
---|---|---|---|
DALL-E Mini | borisdayma (Boris Dayma) | Apache 2.0 | Generate images from a text prompt |
neural-style | anishathalye | GPLv3 | An implementation of neural style in TensorFlow |
External links
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the page “GNU Free Documentation License”.
The copyright and license notices on this page only apply to the text on this page. Any software or copyright-licenses or other similar notices described in this text has its own copyright notice and license, which can usually be found in the distribution or license text itself.