Difference between revisions of "Free Software Directory:Import"

From Free Software Directory
Jump to: navigation, search
(link to the input and output. ask for someone to help with the next step)
(Haskell: "If your version of cabal-install is too low: you don't need to upgrade your OS, if you try to install the Haskell tools using GHCup: https://www.haskell.org/ghcup/" - Thanks Yuchen Pei)
 
(14 intermediate revisions by 3 users not shown)
Line 1: Line 1:
Since the Trisquel package database lists projects that are not currently in
+
<onlyinclude>The project enables us to import package information from free software repositories. Long ago a friend put together scripts that we used to import thousands of entries. But thousands upon thousands more could be added if we update the import scripts. The project has been sitting idle for a while now at <https://savannah.gnu.org/p/directory>, but waiting for volunteers to jump in and refine it.</onlyinclude>
the FSD, and because they get updated with new version info with every
 
release, it may be a good idea to import that data into the FSD in an
 
automated way.
 
  
== Challenges ==
+
'''Team Captain''': Free Software Foundation
 +
 
 +
'''Participants:''' See https://savannah.gnu.org/project/memberlist.php?group=directory
 +
 
 +
==Debian repository metadata files==
 +
See https://git.savannah.gnu.org/cgit/directory.git/tree/src/FSD/Download.hs?h=haskell
 +
 
 +
They are downloaded to /var/lib/apt/lists on Debian, and also Trisquel (Ubuntu based).
 +
 
 +
==Branches==
 +
Pei based the Haskell branch on the old Directory Import scripts, which format pages based on Python branch. Neither branch will make pages with the new format. So neither branch is stable.
 +
 
 +
===Python===
 +
The Python source files are in the [https://git.savannah.gnu.org/cgit/directory.git/tree/ master] branch.
 +
 
 +
===Haskell===
 +
See the [https://git.savannah.gnu.org/cgit/directory.git/tree/?h=haskell haskell] branch.
 +
 
 +
If your version of cabal-install is too low: you don't need to upgrade your OS, if you try to install the Haskell tools using GHCup: https://www.haskell.org/ghcup/
 +
 
 +
====Trisquel 11 installation guide for the Haskell branch====
 +
<pre>
 +
sudo apt-get install cabal-install git
 +
cabal update
 +
git clone https://git.savannah.gnu.org/git/directory.git  # clone the repo
 +
cd directory
 +
git fetch origin      # fetch the haskell branch
 +
git checkout origin/haskell    # check out the haskell branch.
 +
</pre>
 +
 
 +
==About==
 +
 
 +
[[File:ApacheBin.png|link=https://savannah.gnu.org/p/directory|Directory import project]] [https://savannah.gnu.org/p/directory Directory import project] is a script used to download Debian main repository (which only contains [https://www.gnu.org/philosophy/free-sw.html Free Software]) meta-data and construct MediaWiki entries that can be imported to this wiki.
 +
 
 +
The Directory import project has not imported meta-data since 2011.
 +
 
 +
To contribute to the ''Directory import project'', you have to become a member of the by [https://savannah.gnu.org/project/memberlist.php?group=directory Import Team].
 +
 
 +
 
 +
We have begun the process of importing packages from the main area on Debian GNU/Linux. Right now the process is three steps:
 +
 
 +
* We have a tool (to be published on savannah shortly) that collects package metadata and outputs a JSON file.
 +
* By hand we clean-up the JSON file, such as removing parts of the description that are Debian-specific and whatnot.
 +
* We then generate a set of wiki files that are then imported into the Directory using a simple import script.
 +
 
 +
See [[Free_Software_Directory:Import/Debian-2013-03-20|Debian-2013-03-20]] for statistics on our first round of importing from Debian.
 +
 
 +
==Further work on importer==
  
 
* The first challenge is to figure out what trisquel projects correspond to which FSD projects, and which ones have no match in the other database.
 
* The first challenge is to figure out what trisquel projects correspond to which FSD projects, and which ones have no match in the other database.
 
 
** I've [https://gitorious.org/fuzzyfields written a program] that takes lines of tab-separated fields and returns info about approximate matches. Once the output is generated, people need to go through it and select the correct choice. You can get the input i used and the output it generated [http://lists.gnu.org/archive/html/directory-discuss/2012-10/msg00010.html in an attachement] to the directory-discuss mailing list.
 
** I've [https://gitorious.org/fuzzyfields written a program] that takes lines of tab-separated fields and returns info about approximate matches. Once the output is generated, people need to go through it and select the correct choice. You can get the input i used and the output it generated [http://lists.gnu.org/archive/html/directory-discuss/2012-10/msg00010.html in an attachement] to the directory-discuss mailing list.
 
 
** We need someone to make a javascript interface that lets people choose what they think a correct match is. We'll save those result for the steps below.
 
** We need someone to make a javascript interface that lets people choose what they think a correct match is. We'll save those result for the steps below.
 
 
* Then, a program needs to read from the Trisquel package database, to gather all of the info.
 
* Then, a program needs to read from the Trisquel package database, to gather all of the info.
 
 
** malberts was talking on irc about a program ([https://launchpad.net/appnr-api the appnr api] used by appnr.com) that downloads apt package info from a repository and then adds the info into a sql database, and offers an api for working with the data.
 
** malberts was talking on irc about a program ([https://launchpad.net/appnr-api the appnr api] used by appnr.com) that downloads apt package info from a repository and then adds the info into a sql database, and offers an api for working with the data.
 
 
* Then a program needs to be written to automatically update FSD entries based on the Trisquel data.
 
* Then a program needs to be written to automatically update FSD entries based on the Trisquel data.
 
 
* If/when the flagged revs plugin is installed, then those updates can be approved on a case-by-case basis, to make sure that things match, and nothing is broken.
 
* If/when the flagged revs plugin is installed, then those updates can be approved on a case-by-case basis, to make sure that things match, and nothing is broken.
  
== Getting Involved ==
+
[[Category:Project Team]]
 
 
If you'd like to help out with this effort, then [[User_talk:Sudoman|drop me a message]].
 

Latest revision as of 06:39, 10 April 2023

The project enables us to import package information from free software repositories. Long ago a friend put together scripts that we used to import thousands of entries. But thousands upon thousands more could be added if we update the import scripts. The project has been sitting idle for a while now at <https://savannah.gnu.org/p/directory>, but waiting for volunteers to jump in and refine it.

Team Captain: Free Software Foundation

Participants: See https://savannah.gnu.org/project/memberlist.php?group=directory

Debian repository metadata files

See https://git.savannah.gnu.org/cgit/directory.git/tree/src/FSD/Download.hs?h=haskell

They are downloaded to /var/lib/apt/lists on Debian, and also Trisquel (Ubuntu based).

Branches

Pei based the Haskell branch on the old Directory Import scripts, which format pages based on Python branch. Neither branch will make pages with the new format. So neither branch is stable.

Python

The Python source files are in the master branch.

Haskell

See the haskell branch.

If your version of cabal-install is too low: you don't need to upgrade your OS, if you try to install the Haskell tools using GHCup: https://www.haskell.org/ghcup/

Trisquel 11 installation guide for the Haskell branch

sudo apt-get install cabal-install git
cabal update
git clone https://git.savannah.gnu.org/git/directory.git  # clone the repo
cd directory
git fetch origin       # fetch the haskell branch
git checkout origin/haskell    # check out the haskell branch.

About

Directory import project Directory import project is a script used to download Debian main repository (which only contains Free Software) meta-data and construct MediaWiki entries that can be imported to this wiki.

The Directory import project has not imported meta-data since 2011.

To contribute to the Directory import project, you have to become a member of the by Import Team.


We have begun the process of importing packages from the main area on Debian GNU/Linux. Right now the process is three steps:

  • We have a tool (to be published on savannah shortly) that collects package metadata and outputs a JSON file.
  • By hand we clean-up the JSON file, such as removing parts of the description that are Debian-specific and whatnot.
  • We then generate a set of wiki files that are then imported into the Directory using a simple import script.

See Debian-2013-03-20 for statistics on our first round of importing from Debian.

Further work on importer

  • The first challenge is to figure out what trisquel projects correspond to which FSD projects, and which ones have no match in the other database.
    • I've written a program that takes lines of tab-separated fields and returns info about approximate matches. Once the output is generated, people need to go through it and select the correct choice. You can get the input i used and the output it generated in an attachement to the directory-discuss mailing list.
    • We need someone to make a javascript interface that lets people choose what they think a correct match is. We'll save those result for the steps below.
  • Then, a program needs to read from the Trisquel package database, to gather all of the info.
    • malberts was talking on irc about a program (the appnr api used by appnr.com) that downloads apt package info from a repository and then adds the info into a sql database, and offers an api for working with the data.
  • Then a program needs to be written to automatically update FSD entries based on the Trisquel data.
  • If/when the flagged revs plugin is installed, then those updates can be approved on a case-by-case basis, to make sure that things match, and nothing is broken.


Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the page “GNU Free Documentation License”.

The copyright and license notices on this page only apply to the text on this page. Any software or copyright-licenses or other similar notices described in this text has its own copyright notice and license, which can usually be found in the distribution or license text itself.