Join any G-TechEd Module and get a Calling (SIM) Tab along with your study material & G-Dream Service. For more information about G-TechEd Module Write us : info@gtechnosoft.in

Tuesday, August 19, 2014

Quick Reference Guide - CRUSH & Ceph


CRUSH


CRUSH (Controlled Replication Under Scalable Hashing) is a hash-based algorithm for calculating how and where to store and retrieve data in a distributed object-based storage cluster.

CRUSH is the pseudo-random data placement algorithm that efficiently distributes object replicas across a Ceph storage cluster. Cluster size needs to be flexible, and device failure is going to happen. CRUSH allows for the addition and removal of storage devices with as little movement of data as possible. Ceph utilizes the CRUSH algorithm to compute where data can be found or should be written. This eliminates metadata bottlenecks, which increases overall efficiency accessing data in the cluster. Ceph clients accessing storage and Ceph devices that replicate data to their peers both run the CRUSH algorithm. This allows the work to scale linearly with the size of the cluster.

CRUSH is an algorithm that can calculate the physical location of data in Ceph, given the object name, cluster map and CRUSH rules as input.

CRUSH distributes data evenly across available object storage devices in what is often described as a pseudo-random manner. Distribution is controlled by a hierarchical cluster map called a CRUSH map. The CRUSH map, which can be customized by the storage administrator, informs the cluster about the layout and capacity of nodes in the storage network and specifies how redundancy should be managed. 

CRUSH replicates data in multiple locations and fault domains. So when a disk fails, CRUSH replicates data across available OSDs. There is no need for RAID, which typically just adds to the hardware cost.

Ceph and CRUSH allow you to work in a heterogeneous structured environment that frees you from vendor lock-in and expensive proprietary hardware. CRUSH is also self-managing and self-healing, this reduces the overall need for human intervention in your data center.

CRUSH is one of the key features that makes Ceph powerful and uniquely scalable compared to other storage systems that we see today.


Ceph



Linux's impressive selection of file systems is Ceph, a distributed file system that incorporates replication and fault tolerance while maintaining POSIX compatibility.

CRUSH was designed for Ceph, an open source software designed to provide object-, block- and file-based storage under a unified system. Because CRUSH allows clients to communicate directly with storage devices without the need for a central index server to manage data object locations, Ceph clusters can store and retrieve data very quickly and scale up or down quite easily.

As if the dynamic and adaptive nature of the file system weren't enough, Ceph also implements some interesting features visible to the user. Users can create snapshots, for example, in Ceph on any subdirectory (including all of the contents). It's also possible to perform file and capacity accounting at the subdirectory level, which reports the storage size and number of files for a given subdirectory (and all of its nested contents).

Ceph is a distributed object store and file system designed to provide excellent performance, reliability and scalability.


Goals


  • Easy scalability to multi-petabyte capacity (Required for BigData)
  • High performance over varying workloads (input/output operations per second [IOPS] and bandwidth)
  • Strong reliability


Note:


Ceph is not only a file system but an object storage ecosystem with enterprise-class features. Ceph isn't unique in the distributed file system space, but it is unique in the way that it manages a large storage ecosystem. Other examples of distributed file systems include the Google File System (GFS), the General Parallel File System (GPFS), and Lustre etc. The ideas behind Ceph appear to offer an interesting future for distributed file systems, as massive scales introduce unique challenges to the massive storage problem. 

Although Ceph is now integrated into the mainline Linux kernel, it's properly noted there as experimental. File systems in this state are useful to evaluate but are not yet ready for production environments. 






No comments:

Post a Comment