Difference between revisions of "Free Software Directory:Participate/Script aid"

From Free Software Directory
Jump to: navigation, search
(Improve copyright and license detection.)
m
(5 intermediate revisions by the same user not shown)
Line 3: Line 3:
 
This script aims to help contributors on finding things that might be needed to consider during evaluation of the project/software.
 
This script aims to help contributors on finding things that might be needed to consider during evaluation of the project/software.
  
For example, it lists all the MIME/media types of files in the current directory, so that one can find cases of files without complete and corresponding source and also find JavaScript files to look or insert GNU LibreJS syntax.
+
Besides searching for common licensing terms, it lists all the MIME/media types of files in the current directory, so that one can find cases of files without complete and corresponding source and also find JavaScript files to look or insert GNU LibreJS syntax.
  
It also scans files for words related to DOM Level 0 event handlers, so that the contributor can also evaluate if there is need to add GNU LibreJS syntax.
+
To tackle cases of files which are partly JavaScript, it also scans for words related to JS and event handlers.
  
The output is in CSV format. You can use GNU Awk to parse or query such files, before doing so, set <code>FPAT</code> to <code>([^,]*)|(\"[^\"]+\")</code> and <code>RS</code> to <code>\r\n</code>, This variable setting was based on [https://www.gnu.org/software/gawk/manual/html_node/Splitting-By-Content.html the related section in the GNU Awk User's Guide] and on Awk's Texinfo/Info page.
+
As an extra precaution, an extra pass for other possible problematic words is made.
 +
 
 +
There is also a field for notes which is always left empty for the evaluator to do the appropriate observations or even insert marks of continuation for future resume of the review.
 +
 
 +
The output is in CSV format, making it suitable for parsing by other software, such as LibreOffice Calc, GNU R and even GNU Awk. For the last case, before doing so, set <code>FPAT</code> to <code>([^,]*)|(\"[^\"]+\")</code> and <code>RS</code> to <code>\r\n</code>, This variable setting was based on [https://www.gnu.org/software/gawk/manual/html_node/Splitting-By-Content.html the related section in the GNU Awk User's Guide] and on Awk's Texinfo/Info page.
 +
 
 +
In all cases, when importing, make sure that the selected language is English, so that "TRUE" and "FALSE" can be translated as correct boolean representation in your language of choice. Also, be aware of false-positives.
  
 
You're welcome to contribute to this script and add your name and contact information to the copyright notice of the script.
 
You're welcome to contribute to this script and add your name and contact information to the copyright notice of the script.
Line 13: Line 19:
 
== Usage ==
 
== Usage ==
  
Best if you take the complete corresponding source of the project being evaluated (''e.g.'': when using <code>git clone</code>, you can accomplish this using the <code>--recursive</code> option.
+
 
 +
First, it's recommended to grant executable permissisns for your user, like so:
 +
 
 +
<pre style="white-space: pre-wrap">
 +
chmod u+x [path to FSD Script Aid.sh]
 +
</pre>
 +
 
 +
Then, take the complete corresponding source of the project being evaluated (''e.g.'': when using <code>git clone</code>, you can accomplish this using the <code>--recursive</code> option.
  
 
<pre style="white-space: pre-wrap">
 
<pre style="white-space: pre-wrap">
Line 25: Line 38:
 
Now leave the script to do its work and wait for the sound clue to continue working on the evaluation.
 
Now leave the script to do its work and wait for the sound clue to continue working on the evaluation.
  
== Notes ==
+
After this, you might be wondering: how to go exactly to the points where the script found something, on each file? For this purpose there is the <code>--print-ere</code> (<code>-p</code>) option. It prints an extended regular expression (ERE) that can be reused by you in any text editor to look for the same things, per file.
 +
 
 +
For example, you can do this:
 +
 
 +
<pre style="white-space: pre-wrap">
 +
less -p "$([Script aid.] -p)" "file1" "file2" "fileN"
 +
</pre>
  
* The script always takes the current working directory as basis of operation.
 
* When importing, make sure that the selected language is English, so that "TRUE" and "FALSE" can be translated as correct bolean representation in your language of choice.
 
* Be aware of false-positives.
 
* Due to our lack of knowledge on various spoken languages, some things might not be found by the script.
 
  
 
== Script ==
 
== Script ==
  
 
<pre style="white-space: pre-wrap">
 
<pre style="white-space: pre-wrap">
#!/bin/sh
+
#!/bin/bash
 +
 
 +
#    FSD Script Aid: Helps evaluate entries for Free Software Directory.
 +
#    Copyright (C) 2016, 2018, 2020, 2021  Adonay "adfeno" Felipe Nogueira
 +
#                              <https://libreplanet.org/wiki/User:Adfeno>
 +
#                                                  <adfeno@hyperbola.info>
 +
#
 +
#    This program is free software: you can redistribute it and/or modify
 +
#    it under the terms of the GNU Affero General Public License as published by
 +
#    the Free Software Foundation, either version 3 of the License, or
 +
#    (at your option) any later version.
 +
#
 +
#    This program is distributed in the hope that it will be useful,
 +
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
 +
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 +
#    GNU Affero General Public License for more details.
 +
#
 +
#    You should have received a copy of the GNU Affero General Public License
 +
#    along with this program.  If not, see <https://www.gnu.org/licenses/>.
 +
 
 +
 
 +
# # Dependencies
 +
#
 +
#
 +
# * GNU bash;
 +
# * any implementation of the following POSIX utilities:
 +
#  * file, whose -i option prints MIME type and character set;
 +
#  * find;
 +
#  * grep;
 +
#  * printf;
 +
#  * sed;
 +
#  * tr.
 +
#
 +
#
 +
# # Usage
 +
#
 +
#
 +
# It's recommended to grant executable permisisons for your user, like so:
 +
#
 +
# chmod u+x [path to FSD Script Aid.sh]
 +
#
 +
# Then, change to a directory containing the source files.
 +
#
 +
# Now do:
 +
#
 +
# [path to FSD Script Aid.sh] [options] > [path to the output .CSV file]
 +
#
 +
# Some options prevent normal output, so you could just do:
 +
#
 +
# [path to FSD Script Aid.sh] [options]
 +
#
 +
#
 +
# # Options
 +
#
 +
#
 +
# ## '--print-ere' ('-p')
 +
#
 +
# Prevent normal output, and print an extended regular expression (ERE) that
 +
# can be reused for checks against any file.
 +
#
 +
# This can be reused against each file with a text editor's regular
 +
# expression search.
 +
#
 +
# For example, with GNU less, one can do:
 +
#
 +
# less -p "$([path to FSD Script Aid.sh] -p)" "file1" "file2" "fileN"
 +
#
 +
# … and then press N to go to each next match in the first file. Shift + N
 +
# searches backwards in the same file.
 +
 
 +
 
 +
# # Extended regular expressions (ERE) to match possible licensing issues
 +
 
 +
 
 +
# Copyright symbol
 +
licensing_ere="©"
 +
 
 +
# Use GNU bash's += assignment to append to an existing variable.
 +
#
 +
# Agreement
 +
licensing_ere+="|agreement|a(co|cue)rdo"
 +
# Allowed
 +
licensing_ere+="|allowed|permitid[ao]"
 +
# As is
 +
licensing_ere+="|as[[:space:]-]*is[^[:alnum:]]"
 +
# Condition
 +
licensing_ere+="|condi(tions?|ci(ón|ones)|ç(ão|ões))"
 +
# Copyright, copyleft, copyfarleft, copyfair, copymi
 +
licensing_ere+="|copy(right|(far)?left|fai?r|m[ei])"
 +
# EULA, exclusive
 +
licensing_ere+="|eula|exclusiv[aeo]"
 +
# Forbid
 +
licensing_ere+="|forbid(s|den)?|pro(hibited|h?ibid[ao])"
 +
# License abbreviations
 +
licensing_ere+="|[al]?gpl|fdl"
 +
# Law
 +
licensing_ere+="|l(aw|e[iy])"
 +
# Liable
 +
licensing_ere+="|liab(le|ilit(y|ies))"
 +
# Responsible
 +
licensing_ere+="|respons(ib(le|ilit(y|ies))|ab(le|ilidad(es?)?)|áve(l|is))"
 +
# License
 +
licensing_ere+="|licen([cs]e|(ç|ci)a)"
 +
# Notice
 +
licensing_ere+="|not(ice|[ií]cia|ifica(tion|ção|ción))"
 +
# Patent, right
 +
licensing_ere+="|patente?|right|d(erech|ireit)o|droit"
 +
# Terms
 +
licensing_ere+="|t(erms?|érminos?|ermos?)"
 +
# Trade
 +
licensing_ere+="|trade[[:space:]]+(mark|secret)"
 +
# Transfer, warrant
 +
licensing_ere+="|transfer|gu?arant|warrant"
 +
 
 +
 
 +
# # Extended regular expression (ERE) to match JavaScript issues
 +
 
 +
 
 +
# Start of boundary
 +
javascript_ere="(^|[^[:alnum:]]+)"
 +
javascript_ere+="("
 +
 
 +
# Script tag or addEventListener() function
 +
javascript_ere+="script|addeventlistener"
 +
 
 +
# Start of after, before, on variations
 +
javascript_ere+="|(after|before|on)"
 +
javascript_ere+="("
 +
 
 +
# Abort, autocomplete, blur, cancel, canplay
 +
javascript_ere+="abort|autocomplete(error)?|blur|cancel|canplay(through)?"
 +
# ( Cue, duration, hash, language, rate, readystate, volume ) … change
 +
javascript_ere+="|(cue|duration|hash|language|rate|readystate|volume)?change"
 +
# Click, close, contextmenu
 +
javascript_ere+="|(db)?click|close|contextmenu"
 +
# Drag and drop
 +
javascript_ere+="|drag(end|enter|exit|leave|over|start)|drop"
 +
# Emptied, ended, error, focus, input, invalid
 +
javascript_ere+="|emptied|ended|error|focus|input|invalid"
 +
# Key presses
 +
javascript_ere+="|key(down|press|up)"
 +
# Load
 +
javascript_ere+="|(un)?load(ed(meta)?data)?"
 +
# Start, message
 +
javascript_ere+="|start|message"
 +
# Mouse
 +
javascript_ere+="|mouse(down|enter|leave|move|out|over|up|wheel)"
 +
# Connectivity and page
 +
javascript_ere+="|(off|on)line|page(hide|show)"
 +
# Play, popstate, print, progress
 +
javascript_ere+="|pause|play(ing)?|popstate|print|progress"
 +
# Reset, resize, scroll, seek
 +
javascript_ere+="|reset|resize|scroll|seek(ed|ing)"
 +
# Select, show, sort, stalled, storage, submit
 +
javascript_ere+="|select|show|sort|stalled|storage|submit"
 +
# Suspend, timeupdate, toggle, waiting
 +
javascript_ere+="|suspend|timeupdate|toggle|waiting"
 +
 
 +
# End of after, before, on variations
 +
javascript_ere+=")"
 +
 
 +
# End of boundary
 +
javascript_ere+=")"
 +
javascript_ere+="([^[:alnum:]]+|$)"
 +
 
 +
 
 +
# # Extended regular expression (ERE) to match possible other issues
 +
 
 +
 
 +
# Google
 +
other_ere+="gcm|google([^[:alnum:]]*cloud[^[:alnum:]]*messaging)?|youtube|yt"
 +
# Microsoft, Facebook, WhatsApp, Telegram
 +
other_ere+="|microsoft|facebook|whats[^[:alnum:]]*app|telegram"
 +
# CDNs, Amazon, CloudFlare
 +
other_ere+="|amazon|aws|cloud[^[:alnum:]]*flare|cdn"
 +
# Apple, Uber, AirBnB
 +
other_ere+="|apple|uber|air[^[:alnum:]]*bnb"
 +
# System distributions
 +
other_ere+="|android|lineage|cyanogen"
 +
# DRM, Chrome and derivated
 +
other_ere+="|drm|chrom(e|ium)|electron"
 +
 
 +
if [[ $1 == "--print-ere" || \
 +
  $1 == "-p" ]]; then
 +
 
 +
  # Print the extended regular expressions so that you can use then later.
 +
  echo "$licensing_ere|$javascript_ere|$other_ere"
 +
  exit
 +
fi
  
# FSD Participation Aid: Helps user evaluate entries for the Free Software Directory.
+
# Separates MIME type from the character set.
# Copyright (C) 2016, 2018, 2020  Adonay "adfeno" Felipe Nogueira <https://libreplanet.org/wiki/User:Adfeno> <adfeno@hyperbola.info>
+
#
 +
# * Arguments
 +
#  * $1: file path to inspect MIME type and character set.
 +
# * Standard output
 +
#  * Comma-separated pair consisting of MIME type and character set.
 +
separate_mime_from_charset() {
 +
# Make use of -i to get MIME type and character set.
 +
# sed does the split and tr deletes line feeds/new lines.
 +
  file -hi "$1" \
 +
    | sed \
 +
      '{
 +
        s/^\.\//"/g
 +
        s/: /",/g
 +
        s/; charset=/,/g
 +
      }' \
 +
      | tr -d "\n"
 +
}
  
# This program is free software: you can redistribute it and/or modify
+
# Tells if ERE was found in file.
# it under the terms of the GNU Affero General Public License as
+
#
# published by the Free Software Foundation, either version 3 of the
+
# * Arguments
# License, or (at your option) any later version.
+
#  * $1: extended regular expression to be searched;
 +
#   * $2: file path to look for given extended regular expression.
 +
# * Standard output
 +
#   * If a matche was found
 +
#    * String "TRUE"
 +
#  * Else
 +
#    * String "FALSE"
 +
was_ere_found() {
  
# This program is distributed in the hope that it will be useful, but
+
grep -Eiq "$1" "$2"
# WITHOUT ANY WARRANTY; without even the implied warranty of
+
[ $? -eq 0 ] && printf "TRUE" || printf "FALSE"
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+
}
# Affero General Public License for more details.
 
  
# You should have received a copy of the GNU Affero General Public
+
# Use GNU bash's -f option to export functions for use with find.
# License along with this program. If not, see
+
export -f separate_mime_from_charset
# <https://www.gnu.org/licenses/>.
+
export -f was_ere_found
  
printf '"Path","MIME","Charset","Copyright or license","Script tag or DOM Level 0","Notes"\r\n'
+
printf '"Path","MIME","Charset","Licensing","JavaScript","Other","Notes"\r\n'
find "." ! \( -name '.' -or \( -type d -or -type l \) \) \
+
find "." \
    -exec sh -c 'file -hi "$0" \
+
\( \
            | sed '\''{ s/^\.\//"/g
+
-type d \
          s/: /",/g
+
\( -name '.cvs' -o -name '.hg' -o -name '.git' -o -name '.svn' \) \
      s/; charset=/,/g }'\'' \
+
\) -prune -o \
  | tr -d "\n"' '{}' \; \
+
! -type d \
    -printf ',' \
+
-exec bash -c \
    -exec sh -c 'grep -Eiq \
+
'separate_mime_from_charset "{}"' \; \
      "(©|(^|[^[:alnum:][:punct:]]+)((agreement|a(co|cue)rdo)|(allowed|permitid[ao])|as[[:space:]]+is[^[:alnum:]]|condi(tions?|ci(ón|ones)|ç(ão|ões))|copy(right|left|fai?r|m[ei])|eula|exclusiv[aeo]|fdl|(forbid(s|den)?|pro(hibited|h?ibid[ao]))|[al]?gpl|l(aw|e[iy])|(liab(le|ilit(y|ies))|respons(ib(le|ilit(y|ies))|ab(le|ilidad(es?)?)|áve(l|is)))|licen([cs]e|(ç|ci)a)|(not(ice|[ií]cia|ifica(tion|ção|ción)))|patente?|(right|d(erech|ireit)o|droit)|t(erms?|érminos?|ermos?)|trade[[:space:]]+(mark|secret)|transfer|(gu?arant|warrant)))" \
+
-exec printf ',' \; \
      "$0"
+
-exec bash -c \
  grep_exit_status=$?
+
'was_ere_found "'"$licensing_ere"'" "{}"' \; \
  if [ $grep_exit_status -eq 0 ]; then
+
-exec printf ',' \; \
    printf "TRUE"
+
-exec bash -c \
  elif [ $grep_exit_status -eq 1 ]; then
+
'was_ere_found "'"$javascript_ere"'" "{}"' \; \
    printf "FALSE"
+
-exec printf ',' \; \
  fi' '{}' \; \
+
-exec bash -c \
    -printf ',' \
+
'was_ere_found "'"$other_ere"'" "{}"' \; \
    -exec sh -c 'grep -Eiq \
+
-exec printf ',' \; \
                "(script|addEventListener|(after|before|on)(abort|autocomplete(error)?|blur|cancel|canplay(through)?|(cue|duration|hash|language|rate|readystate|volume)?change|(db)?click|close|contextmenu|drag(end|enter|exit|leave|over|start)|drop|emptied|ended|error|focus|input|invalid|key(down|press|up)|(un)?load(ed(meta)?data)?|start|message|mouse(down|enter|leave|move|out|over|up|wheel)|(off|on)line|page(hide|show)|pause|play(ing)?|popstate|print|progress|reset|resize|scroll|seek(ed|ing)|select|show|sort|stalled|storage|submit|suspend|timeupdate|toggle|waiting))" \
+
-exec printf '\r\n' \;
                "$0"
 
  grep_exit_status=$?
 
  if [ $grep_exit_status -eq 0 ]; then
 
    printf "TRUE"
 
  elif [ $grep_exit_status -eq 1 ]; then
 
    printf "FALSE"
 
  fi' '{}' \; \
 
    -printf ',' \
 
    -printf '\r\n'
 
 
</pre>
 
</pre>

Revision as of 13:14, 26 February 2021

Purpose

This script aims to help contributors on finding things that might be needed to consider during evaluation of the project/software.

Besides searching for common licensing terms, it lists all the MIME/media types of files in the current directory, so that one can find cases of files without complete and corresponding source and also find JavaScript files to look or insert GNU LibreJS syntax.

To tackle cases of files which are partly JavaScript, it also scans for words related to JS and event handlers.

As an extra precaution, an extra pass for other possible problematic words is made.

There is also a field for notes which is always left empty for the evaluator to do the appropriate observations or even insert marks of continuation for future resume of the review.

The output is in CSV format, making it suitable for parsing by other software, such as LibreOffice Calc, GNU R and even GNU Awk. For the last case, before doing so, set FPAT to ([^,]*)|(\"[^\"]+\") and RS to \r\n, This variable setting was based on the related section in the GNU Awk User's Guide and on Awk's Texinfo/Info page.

In all cases, when importing, make sure that the selected language is English, so that "TRUE" and "FALSE" can be translated as correct boolean representation in your language of choice. Also, be aware of false-positives.

You're welcome to contribute to this script and add your name and contact information to the copyright notice of the script.

Usage

First, it's recommended to grant executable permissisns for your user, like so:

chmod u+x [path to FSD Script Aid.sh]

Then, take the complete corresponding source of the project being evaluated (e.g.: when using git clone, you can accomplish this using the --recursive option.

git clone --recursive [Some git repository.]
cd [Directory created by git]
[Script aid.] > [Desired text file to store output.]; printf '\a'

printf '\a' can be replaced by a command to play an audio file of your choice.

Now leave the script to do its work and wait for the sound clue to continue working on the evaluation.

After this, you might be wondering: how to go exactly to the points where the script found something, on each file? For this purpose there is the --print-ere (-p) option. It prints an extended regular expression (ERE) that can be reused by you in any text editor to look for the same things, per file.

For example, you can do this:

less -p "$([Script aid.] -p)" "file1" "file2" "fileN"


Script

#!/bin/bash

#    FSD Script Aid: Helps evaluate entries for Free Software Directory.
#    Copyright (C) 2016, 2018, 2020, 2021  Adonay "adfeno" Felipe Nogueira
#                               <https://libreplanet.org/wiki/User:Adfeno>
#                                                  <adfeno@hyperbola.info>
#
#    This program is free software: you can redistribute it and/or modify
#    it under the terms of the GNU Affero General Public License as published by
#    the Free Software Foundation, either version 3 of the License, or
#    (at your option) any later version.
#
#    This program is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#    GNU Affero General Public License for more details.
#
#    You should have received a copy of the GNU Affero General Public License
#    along with this program.  If not, see <https://www.gnu.org/licenses/>.


# # Dependencies
#
#
# * GNU bash;
# * any implementation of the following POSIX utilities:
#   * file, whose -i option prints MIME type and character set;
#   * find;
#   * grep;
#   * printf;
#   * sed;
#   * tr.
#
#
# # Usage
#
#
# It's recommended to grant executable permisisons for your user, like so:
#
# chmod u+x [path to FSD Script Aid.sh]
#
# Then, change to a directory containing the source files.
#
# Now do:
#
# [path to FSD Script Aid.sh] [options] > [path to the output .CSV file]
#
# Some options prevent normal output, so you could just do:
#
# [path to FSD Script Aid.sh] [options]
#
#
# # Options
#
#
# ## '--print-ere' ('-p')
#
# Prevent normal output, and print an extended regular expression (ERE) that
# can be reused for checks against any file.
#
# This can be reused against each file with a text editor's regular
# expression search.
#
# For example, with GNU less, one can do:
#
# less -p "$([path to FSD Script Aid.sh] -p)" "file1" "file2" "fileN"
#
# … and then press N to go to each next match in the first file. Shift + N
# searches backwards in the same file.


# # Extended regular expressions (ERE) to match possible licensing issues


# Copyright symbol
licensing_ere="©"

# Use GNU bash's += assignment to append to an existing variable.
#
# Agreement
licensing_ere+="|agreement|a(co|cue)rdo"
# Allowed
licensing_ere+="|allowed|permitid[ao]"
# As is
licensing_ere+="|as[[:space:]-]*is[^[:alnum:]]"
# Condition
licensing_ere+="|condi(tions?|ci(ón|ones)|ç(ão|ões))"
# Copyright, copyleft, copyfarleft, copyfair, copymi
licensing_ere+="|copy(right|(far)?left|fai?r|m[ei])"
# EULA, exclusive
licensing_ere+="|eula|exclusiv[aeo]"
# Forbid
licensing_ere+="|forbid(s|den)?|pro(hibited|h?ibid[ao])"
# License abbreviations
licensing_ere+="|[al]?gpl|fdl"
# Law
licensing_ere+="|l(aw|e[iy])"
# Liable
licensing_ere+="|liab(le|ilit(y|ies))"
# Responsible
licensing_ere+="|respons(ib(le|ilit(y|ies))|ab(le|ilidad(es?)?)|áve(l|is))"
# License
licensing_ere+="|licen([cs]e|(ç|ci)a)"
# Notice
licensing_ere+="|not(ice|[ií]cia|ifica(tion|ção|ción))"
# Patent, right
licensing_ere+="|patente?|right|d(erech|ireit)o|droit"
# Terms
licensing_ere+="|t(erms?|érminos?|ermos?)"
# Trade
licensing_ere+="|trade[[:space:]]+(mark|secret)"
# Transfer, warrant
licensing_ere+="|transfer|gu?arant|warrant"


# # Extended regular expression (ERE) to match JavaScript issues


# Start of boundary
javascript_ere="(^|[^[:alnum:]]+)"
javascript_ere+="("

# Script tag or addEventListener() function
javascript_ere+="script|addeventlistener"

# Start of after, before, on variations
javascript_ere+="|(after|before|on)"
javascript_ere+="("

# Abort, autocomplete, blur, cancel, canplay
javascript_ere+="abort|autocomplete(error)?|blur|cancel|canplay(through)?"
# ( Cue, duration, hash, language, rate, readystate, volume ) … change
javascript_ere+="|(cue|duration|hash|language|rate|readystate|volume)?change"
# Click, close, contextmenu
javascript_ere+="|(db)?click|close|contextmenu"
# Drag and drop
javascript_ere+="|drag(end|enter|exit|leave|over|start)|drop"
# Emptied, ended, error, focus, input, invalid
javascript_ere+="|emptied|ended|error|focus|input|invalid"
# Key presses
javascript_ere+="|key(down|press|up)"
# Load
javascript_ere+="|(un)?load(ed(meta)?data)?"
# Start, message
javascript_ere+="|start|message"
# Mouse
javascript_ere+="|mouse(down|enter|leave|move|out|over|up|wheel)"
# Connectivity and page
javascript_ere+="|(off|on)line|page(hide|show)"
# Play, popstate, print, progress
javascript_ere+="|pause|play(ing)?|popstate|print|progress"
# Reset, resize, scroll, seek
javascript_ere+="|reset|resize|scroll|seek(ed|ing)"
# Select, show, sort, stalled, storage, submit
javascript_ere+="|select|show|sort|stalled|storage|submit"
# Suspend, timeupdate, toggle, waiting
javascript_ere+="|suspend|timeupdate|toggle|waiting"

# End of after, before, on variations
javascript_ere+=")"

# End of boundary
javascript_ere+=")"
javascript_ere+="([^[:alnum:]]+|$)"


# # Extended regular expression (ERE) to match possible other issues


# Google
other_ere+="gcm|google([^[:alnum:]]*cloud[^[:alnum:]]*messaging)?|youtube|yt"
# Microsoft, Facebook, WhatsApp, Telegram
other_ere+="|microsoft|facebook|whats[^[:alnum:]]*app|telegram"
# CDNs, Amazon, CloudFlare
other_ere+="|amazon|aws|cloud[^[:alnum:]]*flare|cdn"
# Apple, Uber, AirBnB
other_ere+="|apple|uber|air[^[:alnum:]]*bnb"
# System distributions
other_ere+="|android|lineage|cyanogen"
# DRM, Chrome and derivated
other_ere+="|drm|chrom(e|ium)|electron"

if [[ $1 == "--print-ere" || \
  $1 == "-p" ]]; then

  # Print the extended regular expressions so that you can use then later.
  echo "$licensing_ere|$javascript_ere|$other_ere"
  exit
fi

# Separates MIME type from the character set.
#
# * Arguments
#   * $1: file path to inspect MIME type and character set.
# * Standard output
#   * Comma-separated pair consisting of MIME type and character set.
separate_mime_from_charset() {
# Make use of -i to get MIME type and character set.
# sed does the split and tr deletes line feeds/new lines.
  file -hi "$1" \
    | sed \
      '{
         s/^\.\//"/g
         s/: /",/g
         s/; charset=/,/g
       }' \
      | tr -d "\n"
}

# Tells if ERE was found in file.
#
# * Arguments
#   * $1: extended regular expression to be searched;
#   * $2: file path to look for given extended regular expression.
# * Standard output
#   * If a matche was found
#     * String "TRUE"
#   * Else
#     * String "FALSE"
was_ere_found() {

	grep -Eiq "$1" "$2"
	[ $? -eq 0 ] && printf "TRUE" || printf "FALSE"
}

# Use GNU bash's -f option to export functions for use with find.
export -f separate_mime_from_charset
export -f was_ere_found

printf '"Path","MIME","Charset","Licensing","JavaScript","Other","Notes"\r\n'
find "." \
	\( \
		-type d \
		\( -name '.cvs' -o -name '.hg' -o -name '.git' -o -name '.svn' \) \
	\) -prune -o \
	! -type d \
	-exec bash -c \
		'separate_mime_from_charset "{}"' \; \
	-exec printf ',' \; \
	-exec bash -c \
		'was_ere_found "'"$licensing_ere"'" "{}"' \; \
	-exec printf ',' \; \
	-exec bash -c \
		'was_ere_found "'"$javascript_ere"'" "{}"' \; \
	-exec printf ',' \; \
	-exec bash -c \
		'was_ere_found "'"$other_ere"'" "{}"' \; \
	-exec printf ',' \; \
	-exec printf '\r\n' \;


Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the page “GNU Free Documentation License”.

The copyright and license notices on this page only apply to the text on this page. Any software or copyright-licenses or other similar notices described in this text has its own copyright notice and license, which can usually be found in the distribution or license text itself.