Cloud Storage – an overview | ScienceDirect Topics

GlusterFS

Today, GlusterFS [13] is an open-source scale-out file system, offering NFS, SMB, and HDFS. With more than 8000 user-developers at the time of writing, it has a solid following in the HPC market segment. The software was designed to address the problems of scaling out file and NAS systems, where crossing the boundary between appliances presented a severe roadblock to petabyte-sized configurations.

GlusterFS is also discussed in File Systems for Big Data section of Chapter 8, where the emphasis is more on software structure.

GlusterFS is deployed on COTS servers, either within virtual instances or on bare metal. Scaling capacity is achieved by adding more appliances to the cluster, while performance scaling can be controlled by spreading data out across the appliance pool. Clusters with several petabytes of storage are achievable and multitenant environments with thousands of tenants can be created.

Gluster avoids the scaling issues of centralized metadata by distributing that data across the nodes. It copies from the RAID concept to mirror stripe and replicate data for integrity purposes, but it does so at the appliance level rather than at the drive level. This is critical in large-scale storage, since the failure rate of appliances is sufficiently high that data availability would be at risk if only drives are involved.

GlusterFS is a Linux user-space file system, a deliberate choice allowing faster integration into systems. It utilizes a Linux kernel file system to format the disk space. It can be deployed in environments such as AWS as a result, and this is a good way to get started with the approach. Networking is very well supported, and since January 2015 it has been possible to connect clusters with RDMA for low latency and high performance. iSCSI is supported, too. In common with object storage, replication control is well featured and supports geodiversity.

GlusterFS supports Hadoop seamlessly without the need for application rewrites. This brings fault tolerance to the Hadoop space and allows the choice of file or object access to the data. In fact, APIs have been created to allow objects to be accessed as files and vice versa.

It is also possible to install GlusterFS as a file system for Cinder, the block-IO access model in OpenStack.

Cloud storage can be built around GlusterFS, as virtualized systems, and large NAS systems can be created. These constitute the bulk of GlusterFS deployments, according to Red Hat.

The file system serves the analytics market particularly well, with strong Hadoop compatibility on the big data side and Splunk support for machine analytics. Rich media and data streaming can take advantage of performance tuning and scaled-out capacity.

GlusterFS, under Red Hats direction, is adding features rapidly. The 3.1 release, Everglades, adds in erasure coding, keeping up with the object storage companies, data tiering, and also SMB 3.0 support.

In common with many storage solutions, GlusterFS is destined to become much more automated, with provisioning by users and more policy-based management, with terms like File-as-a-Service and NAS-on-Demand being used by Red Hat. Being a Linux-based solution, its unlikely to ever lose the CLI control process, which makes large-scale operations more error-prone.

Additional features are planned such as deduplication and compression. Performance tuning is on the table, though there are questions about how best to handle all-SSD clusters, as with any software stack in storage today. Planned support for NVMe acknowledges that the issue is understood to some extent.

Data integrity issues still exist in GlusterFS, due to the way data is replicated. Rack-level awareness is on the roadmap, to better distribute replicas and erasure coded data sets.

Theres no good answer to this question, at least for now. Both Ceph and Gluster have strengths and weaknesses, but not enough to give one model a decided edge. The fact that Red Hat drives both of them also reduces competition between them. Generally, Ceph addresses the object space very well, while Gluster might be the choice for data centers focused on more traditional file server networked storage.

Ceph is still coming to terms with the world of filers, and the file access gateway is still evolving towards full production. GlusterFS came to the object space late and it shows. Still, it seems likely that the overlap between them will increase as time goes on and the feature sets converge. This may not matter to Red Hatthey get paid either way and both products are leaders.

Performance is something of a mystery. At time of press there were no published articles on relative performance that had real credibility. Configuration setup and software tuning have a big impact, and it seems that this is still an area needing more study.

Read more:
Cloud Storage - an overview | ScienceDirect Topics

Related Posts

Comments are closed.