A topic is the basic primitive for data stream and the partitions is the unit of parallelism accessed independently by producers and consumers. A topic may be split in any number of partitions.
Topics define a data streams, and partitions the number of data slices for each stream. Topics also have a replication factor that defines durability, the number of copies for each data slice.
Replicas have a leader and one or more followers and distributed across all available SPUs according to the replica assignment algorithm.
For example, when provisioning a topic with 2 partitions and 3 replicas:
$ fluvio topic create --topic topic-a --partitions 2 --replication 3
Leaders maintain the primary data set and followers store a copy of the data. Leaders and followers map to independent SPUs:
Partition are configuration objects managed by the system. Topics and partitions are linked through a parent-child relationship. Partition generation algorithm is described in the SC Architecture.
If a topic is deleted, all child partitions are automatically removed.
When producing records to a Topic that has multiple partitions, there are two cases to consider when determining the partitioning behavior. These cases are:
When producing records with keys, the producers will use hash partitioning, where the partition number is derived from the hash of the record’s key. This is used to uphold the golden rule of key-based partitioning:
Records with the same key are always sent to the same partition
The current implementation of key hashing uses the sip-2-4 hashing algorithm, but that is subject to change in the future.
When producing records with no keys, producers will simply assign partition numbers to incoming records in a round-robin fashion, spreading the load evenly across the available partitions.
Currently, consumers are limited to reading from one partition at a time. This means that in order to read all records from a given topic, it may be necessary to instantiate one consumer per partition in the topic.