Free Software Directory:Big Data Team
Big data refers to the vast amounts of complex and diverse data that exceed the processing capabilities of traditional software, requiring specialized tools and techniques to manage and analyze effectively.
The 5 V structure of Big Data refers to the five key characteristics that define the unique challenges of working with large datasets, including Volume, Variety, Value, Veracity, and Velocity. Volume, which refers to the massive scale of data being generated, Variety, which encompasses the diverse range of data types and sources, Value, which highlights the importance of extracting insights and meaning from the data, Veracity, which emphasizes the need for ensuring data accuracy and trustworthiness, and Velocity, which refers to the rapid pace at which data is being generated and processed.ref
Group info | User info | |||||
---|---|---|---|---|---|---|
User | Role | Reference | Real name | libera.chat nick | Time zone | Title |
David_Hedlund | Coordinator | David Hedlund | David_Hedlund | Europe/Stockholm | ||
GrahamxReed | Team Captain | Graham Reed | Graham_Reed | America/New_York |
Contents
Database management systems (DBMS)
Traditional relational databases (RDBMSs) were well-suited for handling structured data, but the shift towards semi-structured and unstructured data types and formats has pushed the limits of these technologies, necessitating the development of new tools and solutions. For example, PostgreSQL, and MySQL, are both relational database management systems (RDBMS) that are designed to handle large amounts of data, but they may not be suitable for all big data workloads.
Libre/free database management systems (DBMS) for big data
Apache Cassandra | A NoSQL, distributed, multi-master database designed for handling large amounts of data across many commodity servers with no single point of failure.
|
Apache HBase | A NoSQL, distributed, column-family database built on top of Hadoop and designed to handle large amounts of structured and semi-structured data.
|
Apache Ignite | In-memory computing, distributed architecture, fault-tolerant, SQL/NoSQL support, machine learning/analytics.
|
Apache Kudu | A column-store, distributed database designed for fast analytics and low-latency queries on large datasets. |
Frameworks and platforms
- Apache Hadoop: a distributed computing framework for processing large data sets.
- Apache Spark: a data processing engine that supports big data workloads.
- Apache Flink: a platform for distributed stream and batch processing.
- Apache Storm: a distributed real-time computation system.
- Apache Kafka: a distributed streaming platform for building real-time data pipelines.
- Apache NiFi: a data integration platform for managing and processing big data flows.
Web frameworks with built-in support for big data technologies
Big data tools and libraries
Data processing libraries
Data visualization libraries
Machine learning libraries
Data storage and retrieval libraries
Other big data tools and libraries
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the page “GNU Free Documentation License”.
The copyright and license notices on this page only apply to the text on this page. Any software or copyright-licenses or other similar notices described in this text has its own copyright notice and license, which can usually be found in the distribution or license text itself.