Difference between revisions of "Free Software Directory talk:Artificial Intelligence Team"

From Free Software Directory

Jump to: navigation, search

Latest revision as of 01:06, 21 December 2024

1 Software classification for FSF assessment
- 1.1 Examples
  - 1.1.1 Opening-up-chatgpt.github.io
  - 1.1.2 USA Bipartisan House Task Force Report on AI (Dec '24)
2 Mechanics
- 2.1 Python
  - 2.1.1 Array libraries
  - 2.1.2 Differentiation libraries
3 Combatting censorship
4 Testing models
5 Legal
- 5.1 Potential Freedom issues
  - 5.1.1 Model licenses
6 Image Models
- 6.1 Stable Diffusion
  - 6.1.1 stable-diffusion-webui
7 Free software replacements that are missing
8 Legacy notable projects
9 External links

Software classification for FSF assessment

To assist the FSF in evaluating the page at https://www.fsf.org/news/fsf-is-working-on-freedom-in-machine-learning-applications, we organized the software categories here accordingly. This structure aims to clarify the criteria for determining when a machine learning application can be considered free, ensuring that users are empowered to control their computing.

Examples

Opening-up-chatgpt.github.io

Availability: Free code, LLM data, LLM weights, RL data, RL weights, License
Documentation: Code, Architecture, Preprint, Paper, Modelcard, Datasheet
Access: Package, API

USA Bipartisan House Task Force Report on AI (Dec '24)

Federal AI Governance and Transparency (Page 4)

Data and Metadata: information about the data that was used to train, test, or fine-tune the model, including information about the data’s sources or provenance, collection methods, sample size, procedures for cleaning the data, bias and skewness, inclusion of protected characteristics or proxy features, and ultimate integrity.
Software: information about the software components and their origins.
Model Development: information about the training, tuning, validation, and testing of the AI system, who requested its development, who developed the model, communities consulted in development, the development process, tools used in development, the model’s intended uses and known limitations, and metrics pertaining to the model’s efficiency, performance, bias, and energy usage.
Model Deployment: information about model deployment and monitoring, including metrics identified in model development, plans to provide notice and explanation of the use of AI models to members of the public impacted by the model’s use, and any ongoing training, validation, and testing.
Model Use: information about how the model is used, including the organizational context and design of the entire system in which the model is deployed, specific use case applications of the model, the information that a deployed model utilizes, the types of determinations or decisions the model is intended to inform, meaningful explanations of the model and its outcomes given relevant stakeholders, the policies for how to handle outputs, the risks of harm identified, and risk mitigation plans including human oversight or intervention.

Mechanics

It's differentiation on N-dimensional arrays. (n-dimensional means n-amount of axes i.e. x,y,z)

Mess with variables: predicted, actual, and error.

Python

Array libraries

Numpy

Differentiation libraries

Combatting censorship

Decentralized LLMS

Distributed training

Prime Intellect [Apache 2.0]

Peer-to-peer generation

"LocalAI [MIT] uses https://github.com/libp2p/go-libp2p [MIT] under the hood, the same project powering IPFS. Differently from other frameworks, LocalAI uses peer2peer without a single master server, but rather it uses sub/gossip and ledger functionalities to achieve consensus across different peers.

EdgeVPN [Apache 2.0] is used as a library to establish the network and expose the ledger functionality under a shared token to ease out automatic discovery and have separated, private peer2peer networks.

The weights are split proportional to the memory when running into worker mode, when in federation mode each request is split to every node which have to load the model fully." - source

Isolating a web service like Gradio from internet access

Ensures less snooping on local models - Linux Safe Web Service

Decensoring censored models

A guide to decensoring models; Exercise caution, as it stands to reason an inherently uncensored model would perform better than needing the legwork of decensoring one (and then making mistakes + missing some of the censorship)

Examples of censorship

If you ask Deepseek-V2 (through the official site) 'What happened at Tienanmen square?', it deletes your question and clears the context.

Testing models

Tools are needed to assess the pros/cons of each model.

Meta-Leaderboards

Leaderboards

Due to the issue of merely training a model to become good at whatever tests are on a leaderboard, multiple leaderboards are preferential (hence not putting HuggingFace on the main page). A more comprehensive evaluation would be a meta-analysis of existing leaderboards.

Text

AlpacaEval

Another LLM Roleplay Rankings

Ayumi Benchmark ERPv4

Azure AI Studio Model benchmarks

Chaiverse

EQ-Bench (Emotional Intelligence Benchmark)

EQ-Bench (Creative Writing Benchmark)

Googlesheet of models, AI labs, datasets, and various other ML info by Alan Thompson

HuggingFace Open LLM

HuggingFace Uncensored General Intelligence

Kagi Search's llm chess puzzles

LiveBench: A Challenging, Contamination-Free LLM Benchmark

LLM Openness

LMSYS Chatbot Arena

MixEval

NoCha

OpenCompass (China)

OpenRouter LLM Rankings

Predibase's Open-source Model Fine-Tuning Leaderboard

SEAL leaderboards

Simple Bench - Basic Reasoning

Toqan's ProLLM Benchmarks: Summarization

Toqan's ProLLM Benchmarks: Q&A Assistant

Coding

Aider

ClassEval

CRUXEval

EvalPlus

LiveBench: A Challenging, Contamination-Free LLM Benchmark

LiveCodeBench

SEAL leaderboards' coding prompt set

TabbyML Coding LLMs

Toqan's ProLLM Benchmarks: Coding Assistant

Voice

Benchmarks

TruthfulQA: Measuring How Models Mimic Human Falsehoods
PremAI's inference engine benchmarks Hint: top ones are Nvidia's TensorRT, exllamav2, vllm, llama.cpp

Architectures

Transformers
- Package manager and player: Transformers - https://github.com/huggingface/transformers/
- Original science paper: Attention Is All You Need

Hyena
Mamba/Jamba

List of popular text-to-image generative models with their respective parameters and architecture overview

Ordinal value scales could exist for

indistinguishability from human creations - e.g. Human or Not? social turing chat game
inference speed
length of memory
trivia accurateness
computation costs: cpu/vram/ram mhz

Source of model training data

amount of data
date range (e.g. distinguishing old science from new science for smaller scale models)
level of censorship (important to make personal+research use distinct from business use)

Problem solving

math
creative problem solving (there exists methodology for testing this in humans)

"The present findings suggest that the current state of AI language models demonstrate higher creative potential than human respondents." Nature (Feb 10, '24)

Computation Costs

General trends

Larger models are more prone to human superstition[1], but also generate more human-like readability.
Quantization (a la GPT-Q) allows consumer hardware to run large models.

Legal

UN Resolution: Seizing the opportunities of safe, secure and trustworthy artificial intelligence systems for sustainable development

Potential Freedom issues

Dependencies need to be checked. (e.g. LibreChat)
Verify whether a workflow requires non-free GPU or if CPU can be used.
The training data often contains non-free licensed material.
- According to current copyright laws, this does not impact the license of the model or the output of the model. According to current copyright laws, the output is public domain. Mmcmahon (talk) 11:48, 2 May 2023 (EDT)

Model licenses

There appears to be a swath of custom model licenses being used independent of the more standardized software licenses used to interact with models. This presents a conflict as to what license is deemed applicable to the files contained in any repo.

Reddit - Security PSA: huggingface models are code. not just data.

This video (starting at 16:50) illustrates a good argument that model checkpoints may not fall under copyright protection so traditional software licenses that depend on copyright law would be invalid. The video does illustrate that contract law may try to be used it place of copyright. I would advise not using YouTube directly and instead using yt-dl or Invidious.

Worth noting: Open LLaMA (out-of-date) is an example of removing the issue of model cards for text generation.

StabilityAI keeps (Nov '22) updating (July '23) its licenses (Nov '23), so I've removed them from the main page until they settle down.

GLIGEN and GLIGEN GUI is quite neat, but states strict model terms and conditions associated with using it.

Image Models

Stable Diffusion

Stable Diffusion model files (.ckpt) are released under a non-free license.

Here's the stable diffusion beginning point: https://huggingface.co/CompVis/stable-diffusion-v1-4 https://huggingface.co/spaces/CompVis/stable-diffusion-license

stable-diffusion-webui

https://github.com/AUTOMATIC1111/stable-diffusion-webui
- Demo, and guide - https://www.youtube.com/watch?v=R52hxnpNews
- Depends on stable-diffusion repository instead of diffusers. The current branch of stable-diffusion only has a model license which does not make sense for code. Older commits are MIT.
- Extensions:
  - Deforum

Free software replacements that are missing

AI Research Assistant
- https://elicit.org/ - Elicit uses language models to help you automate research workflows, like parts of literature review.
Voice to instrument: Tone Transfer-like
Identification
- Photo
  - Pl@ntNet for Android - Pl@ntNet is a citizen science project for automatic plant identification through photographs and based on machine learning. "The observations shared by the community are published with the associated images under a Creative Common CC-BY-SA license (visible author name)." - https://plantnet.org/en/2020/08/06/your-plntnet-data-integrated-into-gbif/
- Audio
  - Shazam: Shazam is an application that can identify music, movies, advertising, and television shows, based on a short sample played and using the microphone on the device.
  - A Shazam-like software that is identifying genres instead of songs.
  - A free app that functions like midomi.com -- "You can find songs with midomi and your own voice. Forgot the name of a song? Heard a bit of one on the radio? All you need is your computer's microphone."
http://design.rxnfinder.org/addictedchem/prediction/

Legacy notable projects

Vicuna 13B - It appears as though this model is inherently censored [2]
Alpaca 13B
Pygmalion 6B

CodeGen by Salesforce FSD | BSD 3-Clause "New" or "Revised" License
CodeGen2 by Salesforce FSD | Apache-2.0

Project	Credit	License	Description
DALL-E Mini	borisdayma (Boris Dayma)	Apache 2.0	Generate images from a text prompt
neural-style	anishathalye	GPLv3	An implementation of neural style in TensorFlow

Yao Fu's (FranxYao) chain-of-thought hub

InfiCoder-Eval

External links

Applications of artificial intelligence (Wikipedia)

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the page “GNU Free Documentation License”.

The copyright and license notices on this page only apply to the text on this page. Any software or copyright-licenses or other similar notices described in this text has its own copyright notice and license, which can usually be found in the distribution or license text itself.

Retrieved from "https://directory.fsf.org/wiki?title=Free_Software_Directory_talk:Artificial_Intelligence_Team&oldid=92392"

@@ Line 1: / Line 1: @@
-== Free software replacements that are missing ==
+== Software classification for FSF assessment ==
-* AI Research Assistant
+To assist the FSF in evaluating the page at https://www.fsf.org/news/fsf-is-working-on-freedom-in-machine-learning-applications, we organized the software categories here accordingly. This structure aims to clarify the criteria for determining when a machine learning application can be considered free, ensuring that users are empowered to control their computing.
-** https://elicit.org/ - Elicit uses language models to help you automate research workflows, like parts of literature review.
-* Voice to instrument: [https://sites.research.google/tonetransfer Tone Transfer-like]
+=== Examples ===
-* Identification
+==== Opening-up-chatgpt.github.io ====
-** Photo
+* '''Availability''': Free code, LLM data, LLM weights, RL data, RL weights, License
-*** [https://play.google.com/store/apps/details?id=org.plantnet Pl@ntNet for Android] - Pl@ntNet is a citizen science project for automatic plant identification through photographs and based on machine learning. "The observations shared by the community are published with the associated images under a Creative Common CC-BY-SA license (visible author name)." - https://plantnet.org/en/2020/08/06/your-plntnet-data-integrated-into-gbif/
+* '''Documentation''': Code, Architecture, Preprint, Paper, Modelcard, Datasheet
-** Audio
+* '''Access''': Package, API
-*** Shazam: Shazam is an application that can identify music, movies, advertising, and television shows, based on a short sample played and using the microphone on the device.
-*** A Shazam-like software that is identifying genres instead of songs.
+==== USA Bipartisan House Task Force Report on AI (Dec '24) ====
-*** A free app that functions like midomi.com -- "You can find songs with midomi and your own voice. Forgot the name of a song? Heard a bit of one on the radio? All you need is your computer's microphone."
+[https://republicans-science.house.gov/_cache/files/a/a/aa2ee12f-8f0c-46a3-8ff8-8e4215d6a72b/E4AF21104CB138F3127D8FF7EA71A393.ai-task-force-report-final.pdf Federal AI Governance and Transparency (Page 4)]
-* http://design.rxnfinder.org/addictedchem/prediction/
+* '''Data and Metadata''': information about the data that was used to train, test, or fine-tune the model, including information about the data’s sources or provenance, collection methods, sample size, procedures for cleaning the data, bias and skewness, inclusion of protected characteristics or proxy features, and ultimate integrity.
+* '''Software''': information about the software components and their origins.
+* '''Model Development''': information about the training, tuning, validation, and testing of the AI system, who requested its development, who developed the model, communities consulted in development, the development process, tools used in development, the model’s intended uses and known limitations, and metrics pertaining to the model’s efficiency, performance, bias, and energy usage.
+* '''Model Deployment''': information about model deployment and monitoring, including metrics identified in model development, plans to provide notice and explanation of the use of AI models to members of the public impacted by the model’s use, and any ongoing training, validation, and testing.
+* '''Model Use''': information about how the model is used, including the organizational context and design of the entire system in which the model is deployed, specific use case applications of the model, the information that a deployed model utilizes, the types of determinations or decisions the model is intended to inform, meaningful explanations of the model and its outcomes given relevant stakeholders, the policies for how to handle outputs, the risks of harm identified, and risk mitigation plans including human oversight or intervention.
-== Potential Freedom issues ==
+== Mechanics ==
+It's differentiation on N-dimensional arrays. (n-dimensional means n-amount of axes i.e. x,y,z)
-* Dependencies need to be checked.
+Mess with variables: predicted, actual, and error.
-* Verify whether a workflow requires non-free GPU or if CPU can be used.
+=== Python ===
-* The training data often contains non-free licensed material.
+==== Array libraries ====
-** According to current copyright laws, this does not impact the license of the model or the output of the model. According to current copyright laws, the output is public domain. [[User:Mmcmahon|Mmcmahon]] ([[User talk:Mmcmahon|talk]]) 11:48, 2 May 2023 (EDT)
+* [https://numpy.org/doc/stable/index.html Numpy]
+==== Differentiation libraries ====
+* [https://pytorch.org/ PyTorch]
+* [https://www.tensorflow.org/ TensorFlow]
+* [https://jax.readthedocs.io/en/latest/index.html JAX]
+* [https://github.com/rsokl/MyGrad MyGrad]
-[https://www.copyright.gov/ai/ai_policy_guidance.pdf USA copyright AI policy guidance (Mar 16 '23)]
+== Combatting censorship ==
+=== Decentralized LLMS ===
+==== Distributed training ====
+[https://github.com/PrimeIntellect-ai/prime Prime Intellect] [Apache 2.0]
+==== Peer-to-peer generation ====
+"[https://github.com/mudler/LocalAI LocalAI] [https://github.com/mudler/LocalAI#MIT-1-ov-file [MIT]] uses https://github.com/libp2p/go-libp2p [MIT] under the hood, the same project powering IPFS. Differently from other frameworks, LocalAI uses peer2peer without a single master server, but rather it uses sub/gossip and ledger functionalities to achieve consensus across different peers.
-'''Purely generated AI content is not copyrightable'''
+EdgeVPN [Apache 2.0] is used as a library to establish the network and expose the ledger functionality under a shared token to ease out automatic discovery and have separated, private peer2peer networks.
- "For example, when an AI technology receives solely a prompt[27] from a human and produces complex written, visual, or musical works in response, the “traditional elements of authorship” are determined and executed by the technology—not the human user."
-'''Only the human-generated elements of modifying/arranging AI output are copyrightable'''
- "a human may select or arrange AI-generated material in a sufficiently creative way that “the resulting work as a whole constitutes an original work of authorship.”[33] Or an artist may modify material originally generated by AI technology to such a degree that the modifications meet the standard for copyright protection.[34] In these cases, copyright will only protect the human-authored aspects of the work, which are “independent of ” and do “not affect” the copyright status of the AI-generated material itself.[35]"
-- [[User:GrahamxReed|GrahamxReed]] ([[User talk:GrahamxReed|talk]]) 23:00, 14 May 2023 (EDT)
-==== Model licenses ====
+The weights are split proportional to the memory when running into worker mode, when in federation mode each request is split to every node which have to load the model fully." - [https://localai.io/features/distribute/ source]
-There appears to be a swath of custom model licenses being used independent of the more standardized software licenses used to interact with models. This presents a conflict as to what license is deemed applicable to the files contained in any repo.
-[https://www.reddit.com/r/LocalLLaMA/comments/13t2b67/security_psa_huggingface_models_are_code_not_just/ Reddit - Security PSA: huggingface models are code. not just data.]
+=== Isolating a web service like Gradio from internet access ===
+Ensures less snooping on local models - [https://rentry.org/IsolatedLinuxWebService Linux Safe Web Service]
-[https://youtube.com/watch?v=W5M-dvzpzSQ?t=16m50s This video (starting at 16:50) illustrates a good argument that model checkpoints may not fall under copyright protection so traditional software licenses that depend on copyright law would be invalid.] The video does illustrate that contract law may try to be used it place of copyright. I would advise not using YouTube directly and instead using yt-dl or Invidious.
+=== Decensoring censored models ===
+[https://erichartford.com/uncensored-models A guide to decensoring models]; Exercise caution, as it stands to reason an inherently uncensored model would perform better than needing the legwork of decensoring one (and then making mistakes + missing some of the censorship)
-Worth noting: [https://github.com/openlm-research/open_llama Open LLaMA (out-of-date)] is an example of removing the issue of [https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md model cards] for text generation.
+==== Examples of censorship ====
+* [https://www.reddit.com/r/LocalLLaMA/comments/1ctiggk/if_you_ask_deepseekv2_through_the_official_site/ If you ask Deepseek-V2 (through the official site) 'What happened at Tienanmen square?', it deletes your question and clears the context.]
-== Testing model viability ==
+== Testing models ==
 Tools are needed to assess the pros/cons of each model.
+=== Meta-Leaderboards ===
+* [https://huggingface.co/spaces/leaderboards/LeaderboardFinder HuggingFace LeaderboardFinder]
+* [https://llm.extractum.io/ LLM Explorer]
 === Leaderboards ===
 Due to the issue of merely training a model to become good at whatever tests are on a leaderboard, multiple leaderboards are preferential (hence not putting HuggingFace on the main page). A more comprehensive evaluation would be a meta-analysis of existing leaderboards.
-* [https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard HuggingFace's Open LLM leaderboard]
+==== Text ====
+* [https://tatsu-lab.github.io/alpaca_eval/ AlpacaEval]
+* [https://rentry.org/ALLMRR Another LLM Roleplay Rankings]
+* [https://ayumi.m8geil.de/erp4_chatlogs/#!/index Ayumi Benchmark ERPv4]
+* [https://ai.azure.com/explore/benchmarks Azure AI Studio Model benchmarks]
+* [https://console.chaiverse.com/ Chaiverse]
+* [https://eqbench.com/index.html EQ-Bench (Emotional Intelligence Benchmark)]
+* [https://eqbench.com/creative_writing.html EQ-Bench (Creative Writing Benchmark)]
+* [https://lifearchitect.ai/models-table Googlesheet of models, AI labs, datasets, and various other ML info by Alan Thompson]
+* [https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard HuggingFace Open LLM]
+* [https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard HuggingFace Uncensored General Intelligence]
+* [https://github.com/kagisearch/llm-chess-puzzles Kagi Search's llm chess puzzles]
+* [https://livebench.ai/ LiveBench: A Challenging, Contamination-Free LLM Benchmark]
+* [https://opening-up-chatgpt.github.io/ LLM Openness]
+* [https://chat.lmsys.org/?leaderboard LMSYS Chatbot Arena]
+* [https://mixeval.github.io/#leaderboard MixEval]
+* [https://novelchallenge.github.io/index.html NoCha]
+* [https://rank.opencompass.org.cn/home OpenCompass (China)]
+* [https://openrouter.ai/rankings OpenRouter LLM Rankings]
+* [https://predibase.com/fine-tuning-index Predibase's Open-source Model Fine-Tuning Leaderboard]
+* [https://scale.com/leaderboard SEAL leaderboards]
+* [https://simple-bench.com/index.html Simple Bench - Basic Reasoning]
-* [https://tatsu-lab.github.io/alpaca_eval/ AlpacaEval Leaderboard]
+* [https://prollm.toqan.ai/leaderboard Toqan's ProLLM Benchmarks: Summarization]
-* [https://chat.lmsys.org/?leaderboard Large Model Systems Organization Leaderboard]
+* [https://prollm.toqan.ai/leaderboard Toqan's ProLLM Benchmarks: Q&A Assistant]
-* [https://github.com/FranxYao/chain-of-thought-hub#results Yao Fu's (FranxYao) chain-of-thought hub]
+===== Coding =====
+* [https://aider.chat/docs/leaderboards/ Aider]
+* [https://fudanselab-classeval.github.io/leaderboard.html ClassEval]
+* [https://crux-eval.github.io/leaderboard.html CRUXEval]
+* [https://evalplus.github.io/leaderboard.html EvalPlus]
+* [https://livebench.ai/ LiveBench: A Challenging, Contamination-Free LLM Benchmark]
+* [https://livecodebench.github.io/leaderboard.html LiveCodeBench]
+* [https://scale.com/leaderboard/coding SEAL leaderboards' coding prompt set]
+* [https://leaderboard.tabbyml.com/ TabbyML Coding LLMs]
+* [https://prollm.toqan.ai/leaderboard Toqan's ProLLM Benchmarks: Coding Assistant]
-* [https://huggingface.co/spaces/mike-ravkine/can-ai-code-results Mike Ravkine's CanAiCode Leaderboard 🏆]
+==== Voice ====
+* [https://artificialanalysis.ai/speech-to-text Artificial Analysis Speech to Text (ASR) Leaderboard]
+* [https://huggingface.co/spaces/TTS-AGI/TTS-Arena Huggingface TTS Arena: Benchmarking TTS Models in the Wild]
+* [https://github.com/Vaibhavs10/open-tts-tracker Open TTS Tracker]
-* [https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard BigCode Models Leaderboard]
+==== Benchmarks ====
+* [https://arxiv.org/abs/2109.07958 TruthfulQA: Measuring How Models Mimic Human Falsehoods]
+* [https://github.com/premAI-io/benchmarks PremAI's inference engine benchmarks] Hint: top ones are Nvidia's TensorRT, exllamav2, vllm, llama.cpp
-* [https://ayumi.m8geil.de/ayumi_bench_v3_results.html Ayumi Benchmark v3]
+==== Architectures ====
+* Transformers
+** Package manager and player: [[Transformers]] - https://github.com/huggingface/transformers/
+** Original science paper: [https://arxiv.org/abs/1706.03762 Attention Is All You Need]
-* [https://rentry.org/ALLMRR Another LLM Roleplay Rankings]
+* Hyena
+* Mamba/Jamba
-* [https://codeberg.org/jts2323/censorbench censorbench]
+* [https://github.com/vladmandic/automatic/wiki/Models List of popular text-to-image generative models with their respective parameters and architecture overview]
 ==== Ordinal value scales could exist for ====
@@ Line 74: / Line 161: @@
 * math
 * creative problem solving (there exists methodology for testing this in humans)
+"The present findings suggest that the current state of AI language models demonstrate higher creative potential than human respondents." [https://doi.org/10.1038/s41598-024-53303-w Nature (Feb 10, '24)]
+===== Computation Costs =====
+* [https://epochai.org/data/epochdb/table?show=compute-intensive Computation costs and datasets for many models]
+* [https://arxiv.org/pdf/2404.07413 LLaMA 2 performance reached with $100k]
+* [https://arxiv.org/abs/2407.15811 Image Stable Diffusion performance reached with $2k]
 === General trends ===
@@ Line 79: / Line 172: @@
 * Quantization (a la GPT-Q) allows consumer hardware to run large models.
-== Stable Diffusion ==
+== Legal ==
+* [https://documents.un.org/doc/undoc/ltd/n24/065/92/pdf/n2406592.pdf?token=cHeoDbvIBOFdwcPVZ6&fe=true UN Resolution: Seizing the opportunities of safe, secure and trustworthy artificial intelligence systems for sustainable development]
+=== Potential Freedom issues ===
+* Dependencies need to be checked. (e.g. [https://docs.librechat.ai/ LibreChat])
+* Verify whether a workflow requires non-free GPU or if CPU can be used.
+* The training data often contains non-free licensed material.
+** According to current copyright laws, this does not impact the license of the model or the output of the model. According to current copyright laws, the output is public domain. [[User:Mmcmahon|Mmcmahon]] ([[User talk:Mmcmahon|talk]]) 11:48, 2 May 2023 (EDT)
+==== Model licenses ====
+There appears to be a swath of custom model licenses being used independent of the more standardized software licenses used to interact with models. This presents a conflict as to what license is deemed applicable to the files contained in any repo.
+[https://www.reddit.com/r/LocalLLaMA/comments/13t2b67/security_psa_huggingface_models_are_code_not_just/ Reddit - Security PSA: huggingface models are code. not just data.]
+[https://youtube.com/watch?v=W5M-dvzpzSQ?t=16m50s This video (starting at 16:50) illustrates a good argument that model checkpoints may not fall under copyright protection so traditional software licenses that depend on copyright law would be invalid.] The video does illustrate that contract law may try to be used it place of copyright. I would advise not using YouTube directly and instead using yt-dl or Invidious.
+Worth noting: [https://github.com/openlm-research/open_llama Open LLaMA (out-of-date)] is an example of removing the issue of [https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md model cards] for text generation.
+StabilityAI [https://github.com/Stability-AI/stablediffusion/blob/main/LICENSE-MODEL keeps (Nov '22)] [https://github.com/Stability-AI/generative-models/blob/main/model_licenses/LICENSE-SDXL1.0 updating (July '23)] [https://github.com/Stability-AI/generative-models/blob/main/model_licenses/LICENSE-SDXL-Turbo its licenses (Nov '23)], so I've removed them from the main page until they settle down.
+[https://github.com/gligen/GLIGEN GLIGEN] and [https://github.com/mut-ex/gligen-gui GLIGEN GUI] is quite neat, but states strict model terms and conditions associated with using it.
+== Image Models ==
+=== Stable Diffusion ===
 Stable Diffusion model files (.ckpt) are released under a non-free license.
@@ Line 93: / Line 210: @@
 *** [https://github.com/deforum-art/deforum-for-automatic1111-webui Deforum]
-== Large Language Models ==
+== Free software replacements that are missing ==
-===Censorship issues===
+* AI Research Assistant
-[https://erichartford.com/uncensored-models A guide to decensoring models]; I would exercise caution, as it stands to reason an inherently uncensored model would perform better than needing the legwork of decensoring one (and then making mistakes + missing some of the censorship)
+** https://elicit.org/ - Elicit uses language models to help you automate research workflows, like parts of literature review.
+* Voice to instrument: [https://sites.research.google/tonetransfer Tone Transfer-like]
+* Identification
+** Photo
+*** [https://play.google.com/store/apps/details?id=org.plantnet Pl@ntNet for Android] - Pl@ntNet is a citizen science project for automatic plant identification through photographs and based on machine learning. "The observations shared by the community are published with the associated images under a Creative Common CC-BY-SA license (visible author name)." - https://plantnet.org/en/2020/08/06/your-plntnet-data-integrated-into-gbif/
+** Audio
+*** Shazam: Shazam is an application that can identify music, movies, advertising, and television shows, based on a short sample played and using the microphone on the device.
+*** A Shazam-like software that is identifying genres instead of songs.
+*** A free app that functions like midomi.com -- "You can find songs with midomi and your own voice. Forgot the name of a song? Heard a bit of one on the radio? All you need is your computer's microphone."
+* http://design.rxnfinder.org/addictedchem/prediction/
+== Legacy notable projects ==
 * [https://huggingface.co/reeducator/vicuna-13b-free Vicuna 13B] - It appears as though this model is inherently censored [https://github.com/lm-sys/FastChat/issues/89]
-===Unknown license but still noteworthy===
 * [https://huggingface.co/chavinlo/alpaca-13b Alpaca 13B]
 * [https://huggingface.co/PygmalionAI/pygmalion-6b Pygmalion 6B]
+* [https://github.com/salesforce/CodeGen CodeGen] by Salesforce [[CodeGen|FSD]] | [https://github.com/salesforce/CodeGen/blob/main/LICENSE.txt BSD 3-Clause "New" or "Revised" License]
+* [https://github.com/salesforce/CodeGen2 CodeGen2] by Salesforce [[CodeGen2|FSD]] | [https://github.com/salesforce/CodeGen2/blob/main/LICENSE Apache-2.0]
+{| class="wikitable sortable"
+|-
+! Project
+! Credit
+! License
+! Description
+|-
+! [https://github.com/borisdayma/dalle-mini/ DALL-E Mini]
+| borisdayma (Boris Dayma)
+| [https://github.com/borisdayma/dalle-mini/blob/main/LICENSE Apache 2.0]
+| Generate images from a text prompt
+|-
+! [https://github.com/anishathalye/neural-style neural-style]
+| anishathalye
+| [https://github.com/anishathalye/neural-style/blob/master/LICENSE.txt GPLv3]
+| An implementation of neural style in TensorFlow
+|-
+|}
+* [https://github.com/FranxYao/chain-of-thought-hub#results Yao Fu's (FranxYao) chain-of-thought hub]
+* [https://infi-coder.github.io/inficoder-eval/ InfiCoder-Eval]
 ==External links==
 * [https://en.wikipedia.org/wiki/Applications_of_artificial_intelligence Applications of artificial intelligence (Wikipedia)]

Technological freedom is important more than ever!

Free Software Foundation!