Free Software Directory:Big Data Team

From Free Software Directory
 
Jump to: navigation, search

Free Software Foundation-Free Software Directory-Big Data Project Team.png

Big data refers to the vast amounts of complex and diverse data that exceed the processing capabilities of traditional software, requiring specialized tools and techniques to manage and analyze effectively.

The 5 V structure of Big Data refers to the five key characteristics that define the unique challenges of working with large datasets, including Volume, Variety, Value, Veracity, and Velocity. Volume, which refers to the massive scale of data being generated, Variety, which encompasses the diverse range of data types and sources, Value, which highlights the importance of extracting insights and meaning from the data, Veracity, which emphasizes the need for ensuring data accuracy and trustworthiness, and Velocity, which refers to the rapid pace at which data is being generated and processed.ref


Group info User info
User Role Reference Real name libera.chat nick Time zone Title
David_Hedlund Coordinator David Hedlund David_Hedlund Europe/Stockholm
GrahamxReed Team Captain Graham Reed Graham_Reed America/New_York


Database management systems (DBMS)

Traditional relational databases (RDBMSs) were well-suited for handling structured data, but the shift towards semi-structured and unstructured data types and formats has pushed the limits of these technologies, necessitating the development of new tools and solutions. For example, PostgreSQL, and MySQL, are both relational database management systems (RDBMS) that are designed to handle large amounts of data, but they may not be suitable for all big data workloads.


Libre/free database management systems (DBMS) for big data


Apache Cassandra
A NoSQL, distributed, multi-master database designed for handling large amounts of data across many commodity servers with no single point of failure.


Apache HBase
A NoSQL, distributed, column-family database built on top of Hadoop and designed to handle large amounts of structured and semi-structured data.


Apache Ignite
In-memory computing, distributed architecture, fault-tolerant, SQL/NoSQL support, machine learning/analytics.


Apache Kudu
A column-store, distributed database designed for fast analytics and low-latency queries on large datasets.


Frameworks and platforms

  • Apache Hadoop: a distributed computing framework for processing large data sets.
  • Apache Spark: a data processing engine that supports big data workloads.
  • Apache Flink: a platform for distributed stream and batch processing.
  • Apache Storm: a distributed real-time computation system.
  • Apache Kafka: a distributed streaming platform for building real-time data pipelines.
  • Apache NiFi: a data integration platform for managing and processing big data flows.

Web frameworks with built-in support for big data technologies

Big data tools and libraries

Data processing libraries

Data visualization libraries

Machine learning libraries

Data storage and retrieval libraries

Other big data tools and libraries



Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the page “GNU Free Documentation License”.

The copyright and license notices on this page only apply to the text on this page. Any software or copyright-licenses or other similar notices described in this text has its own copyright notice and license, which can usually be found in the distribution or license text itself.