Introduction to Apache Cassandra
Last Updated :
25 Sep, 2024
Cassandra is a distributed database management system which is open source with wide column store, NoSQL database to handle large amount of data across many commodity servers which provides high availability with no single point of failure. It is written in Java and developed by Apache Software Foundation.
Avinash Lakshman & Prashant Malik initially developed the Cassandra at Facebook to power the Facebook inbox search feature. Facebook released Cassandra as an open source project on Google code in July 2008. In March 2009 it became an Apache Incubator project and in February 2010 it becomes a top-level project. Due to its outstanding technical features Cassandra becomes so popular.

Introduction to Cassandra
Apache Cassandra is used to manage very large amounts of structure data spread out across the world. It provides highly available service with no single point of failure. Listed below are some points of Apache Cassandra:
- It is scalable, fault-tolerant, and consistent.
- It is column-oriented database.
- Its distributed design is based on Amazon’s Dynamo and its data model on Google’s Big table.
- It is Created at Facebook and it differs sharply from relational database management systems.
Cassandra implements a Dynamo-style replication model with no single point of failure but its add a more powerful “column family” data model. Cassandra is being used by some of the biggest companies such as Facebook, Twitter, Cisco, Rackspace, eBay, Netflix, and more. The design goal of a Cassandra is to handle big data workloads across multiple nodes without any single point of failure. Cassandra has peer-to-peer distributed system across its nodes, and data is distributed among all the nodes of the cluster. All the nodes of Cassandra in a cluster play the same role. Each node is independent, at the same time interconnected to other nodes. Each node in a cluster can accept read and write requests, regardless of where the data is actually located in the cluster. When a node goes down, read/write request can be served from other nodes in the network.
Features of Cassandra:
Cassandra has become popular because of its technical features. There are some of the features of Cassandra:
1. Easy data distribution – It provides the flexibility to distribute data where you need by replicating data across multiple data centers. for example: If there are 5 node let say N1, N2, N3, N4, N5 and by using partitioning algorithm we will decide the token range and distribute data accordingly. Each node have specific token range in which data will be distribute. let’s have a look on diagram for better understanding.

Ring structure with token range.
2. Flexible data storage – Cassandra accommodates all possible data formats including: structured, semi-structured, and unstructured. It can dynamically accommodate changes to your data structures accordingly to your need.
3. Elastic scalability – Cassandra is highly scalable and allows to add more hardware to accommodate more customers and more data as per requirement.
4. Fast writes – Cassandra was designed to run on cheap commodity hardware. Cassandra performs blazingly fast writes and can store hundreds of terabytes of data, without sacrificing the read efficiency.
5. Always on Architecture – Cassandra has no single point of failure and it is continuously available for business-critical applications that can’t afford a failure.
6. Fast linear-scale performance – Cassandra is linearly scalable therefore it increases your throughput as you increase the number of nodes in the cluster. It maintains a quick response time.
Similar Reads
Introduction to PySpark | Distributed Computing with Apache Spark
Datasets are becoming huge. Infact, data is growing faster than processing speeds. Therefore, algorithms involving large data and high amount of computation are often run on a distributed computing system. A distributed computing system involves nodes (networked computers) that run processes in para
4 min read
Monitoring cluster in Cassandra
The tools for monitoring Cassandra cluster include nodetool, Jconsole and Opscenter. All these tools for monitoring cluster works by communicating through JMX(Java Management Extension). In Cassandra Through JMX, explores many metrics and commands which any of them. These tools can use to monitor an
3 min read
Introduction of NewSQL | Set 2
Prerequisite - Introduction to NoSQL, Difference between SQL and NoSQL The term NewSQL is not exactly as wide as NoSQL. NewSQL systems all begin with the relational data model and the SQL query language and they all attempt to cover portion of similar types of scalability, flexibility or lack-of-foc
2 min read
HDFS - Data Read Operation
HDFS is a distributed file system that stores data over a network of commodity machines. HDFS works on the streaming data access pattern means it supports write-ones and read-many features. Read operation on HDFS is very important and also very much necessary for us to know while working on HDFS tha
4 min read
Creating a NoSQL Table Using Amazon DynamoDB
Pre-requisite: DynamoDB Amazon DynamoDB is a fully managed NoSQL database provided by amazon that supports both document and key-value stored data. In this article, we will learn how to create a table, add data, scan, a query that table, and delete the data by using the DynamoDB console. Benefits of
2 min read
Use of NoSQL in Industry
Prerequisite - Introduction to NoSQL, Difference between SQL and NoSQL Why NoSQL? In recent times you can easily capture and access data from various sources, like Facebook, Google, etc. User's personal information, geographic location data, user generated content, social graphs and machine logging
5 min read
Hadoop Tutorial
Big Data is a collection of data that is growing exponentially, and it is huge in volume with a lot of complexity as it comes from various resources. This data may be structured data, unstructured or semi-structured. So to handle or manage it efficiently, Hadoop comes into the picture. Hadoop is a f
3 min read
Proof of Concept on News Aggregator using Big Data Technologies
Big Data is a huge dataset that can have a high volume of data, velocity, and variety of data. For example, billions of users searching on Google at the same time and that will be a very large dataset. In this, we will discuss Proof of concept(POC) on news aggregator using Big Data (Hadoop, hive, pi
3 min read
Hadoop | History or Evolution
Hadoop is an open source framework overseen by Apache Software Foundation which is written in Java for storing and processing of huge datasets with the cluster of commodity hardware. There are mainly two problems with the big data. First one is to store such a huge amount of data and the second one
4 min read
Expected Properties of a Big Data System
Prerequisite - Introduction to Big Data, Benefits of Big Data There are various properties which mostly relies on complexity as per their scalability in the big data. As per these properties, Big data system should perform well, efficient, and reasonable as well. Letâs explore these properties step
3 min read