Free Software Directory talk:Artificial Intelligence Team

1 Free software replacements that are missing
2 Potential Freedom issues
- 2.1 Model licenses
3 Testing model viability
- 3.1 Leaderboards
- 3.2 General trends
4 Stable Diffusion
- 4.1 stable-diffusion-webui
5 Large Language Models
- 5.1 Censorship issues
6 Legacy notable projects
7 External links

Free software replacements that are missing

AI Research Assistant
- https://elicit.org/ - Elicit uses language models to help you automate research workflows, like parts of literature review.
Voice to instrument: Tone Transfer-like
Identification
- Photo
  - Pl@ntNet for Android - Pl@ntNet is a citizen science project for automatic plant identification through photographs and based on machine learning. "The observations shared by the community are published with the associated images under a Creative Common CC-BY-SA license (visible author name)." - https://plantnet.org/en/2020/08/06/your-plntnet-data-integrated-into-gbif/
- Audio
  - Shazam: Shazam is an application that can identify music, movies, advertising, and television shows, based on a short sample played and using the microphone on the device.
  - A Shazam-like software that is identifying genres instead of songs.
  - A free app that functions like midomi.com -- "You can find songs with midomi and your own voice. Forgot the name of a song? Heard a bit of one on the radio? All you need is your computer's microphone."
http://design.rxnfinder.org/addictedchem/prediction/

Potential Freedom issues

Dependencies need to be checked.
Verify whether a workflow requires non-free GPU or if CPU can be used.
The training data often contains non-free licensed material.
- According to current copyright laws, this does not impact the license of the model or the output of the model. According to current copyright laws, the output is public domain. Mmcmahon (talk) 11:48, 2 May 2023 (EDT)

USA copyright AI policy guidance (Mar 16 '23)

Purely generated AI content is not copyrightable

"For example, when an AI technology receives solely a prompt[27] from a human and produces complex written, visual, or musical works in response, the “traditional elements of authorship” are determined and executed by the technology—not the human user."

Only the human-generated elements of modifying/arranging AI output are copyrightable

"a human may select or arrange AI-generated material in a sufficiently creative way that “the resulting work as a whole constitutes an original work of authorship.”[33] Or an artist may modify material originally generated by AI technology to such a degree that the modifications meet the standard for copyright protection.[34] In these cases, copyright will only protect the human-authored aspects of the work, which are “independent of ” and do “not affect” the copyright status of the AI-generated material itself.[35]"

- GrahamxReed (talk) 23:00, 14 May 2023 (EDT)

Model licenses

There appears to be a swath of custom model licenses being used independent of the more standardized software licenses used to interact with models. This presents a conflict as to what license is deemed applicable to the files contained in any repo.

Reddit - Security PSA: huggingface models are code. not just data.

This video (starting at 16:50) illustrates a good argument that model checkpoints may not fall under copyright protection so traditional software licenses that depend on copyright law would be invalid. The video does illustrate that contract law may try to be used it place of copyright. I would advise not using YouTube directly and instead using yt-dl or Invidious.

Worth noting: Open LLaMA (out-of-date) is an example of removing the issue of model cards for text generation.

StabilityAI keeps (Nov '22) updating (July '23) its licenses (Nov '23), so I've removed them from the main page until they settle down.

GLIGEN and GLIGEN GUI is quite neat, but states strict model terms and conditions associated with using it.

Testing model viability

Tools are needed to assess the pros/cons of each model.

Leaderboards

Due to the issue of merely training a model to become good at whatever tests are on a leaderboard, multiple leaderboards are preferential (hence not putting HuggingFace on the main page). A more comprehensive evaluation would be a meta-analysis of existing leaderboards.

Text

HuggingFace's Open LLM leaderboard

AlpacaEval Leaderboard

Large Model Systems Organization Leaderboard

Yao Fu's (FranxYao) chain-of-thought hub

Mike Ravkine's CanAiCode Leaderboard 🏆

BigCode Models Leaderboard

Ayumi Benchmark v3

Another LLM Roleplay Rankings

censorbench

Googlesheet of models, AI labs, datasets, and various other ML info by Alan Thompson

Voice

Open TTS Tracker

Ordinal value scales could exist for

indistinguishability from human creations - e.g. Human or Not? social turing chat game
inference speed
length of memory
trivia accurateness
computation costs: cpu/vram/ram mhz

Source of model training data

amount of data
date range (e.g. distinguishing old science from new science for smaller scale models)
level of censorship (important to make personal+research use distinct from business use)

Problem solving

math
creative problem solving (there exists methodology for testing this in humans)

"The present findings suggest that the current state of AI language models demonstrate higher creative potential than human respondents." Nature (Feb 10, '24)

General trends

Larger models are more prone to human superstition[1], but also generate more human-like readability.
Quantization (a la GPT-Q) allows consumer hardware to run large models.

Stable Diffusion

Stable Diffusion model files (.ckpt) are released under a non-free license.

Here's the stable diffusion beginning point: https://huggingface.co/CompVis/stable-diffusion-v1-4 https://huggingface.co/spaces/CompVis/stable-diffusion-license

stable-diffusion-webui

https://github.com/AUTOMATIC1111/stable-diffusion-webui
- Demo, and guide - https://www.youtube.com/watch?v=R52hxnpNews
- Depends on stable-diffusion repository instead of diffusers. The current branch of stable-diffusion only has a model license which does not make sense for code. Older commits are MIT.
- Extensions:
  - Deforum

Large Language Models

Censorship issues

A guide to decensoring models; I would exercise caution, as it stands to reason an inherently uncensored model would perform better than needing the legwork of decensoring one (and then making mistakes + missing some of the censorship)

Legacy notable projects

Vicuna 13B - It appears as though this model is inherently censored [2]
Alpaca 13B
Pygmalion 6B

CodeGen by Salesforce FSD | BSD 3-Clause "New" or "Revised" License
CodeGen2 by Salesforce FSD | Apache-2.0

Project	Credit	License	Description
DALL-E Mini	borisdayma (Boris Dayma)	Apache 2.0	Generate images from a text prompt
neural-style	anishathalye	GPLv3	An implementation of neural style in TensorFlow

External links

Applications of artificial intelligence (Wikipedia)

Free Software Foundation!

Free Software Directory talk:Artificial Intelligence Team

Contents

Free software replacements that are missing

Potential Freedom issues

Model licenses

Testing model viability

Leaderboards

Text

Voice

Ordinal value scales could exist for

General trends

Stable Diffusion

stable-diffusion-webui

Large Language Models

Censorship issues

Legacy notable projects

External links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Interaction

Navigation

Creation

Print

Tools

Technological freedom is important more than ever!

Free Software Foundation!

Free Software Directory talk:Artificial Intelligence Team

Contents

Free software replacements that are missing

Potential Freedom issues

Model licenses

Testing model viability

Leaderboards

Text

Voice

Ordinal value scales could exist for

General trends

Stable Diffusion

stable-diffusion-webui

Large Language Models

Censorship issues

Legacy notable projects

External links

Navigation menu

Search