The Trim Horizon Of AWS Kinesis And The Concept Of Checking
The concern is documenting the concept of TRIM HORIZON and CHECKING in AWS Kinesis.
- This opens the understanding of the errors I am seeing these days such as
The consumer has checkpointed the shard at TRIM_HORIZON but hasn't started processing it
1. SHARDS, STREAMS, AND ITERATORS
TRIM_HORIZONis the type of shard iterator
- shard is a unit of volume (data sequencing) within the stream
- shard iterator is what defines the position to start consuming data records sequentially
2. SHARD ITERATOR
- these points can be of various types
- checkpointing is essentially a mechanism allowing you to restart stream processing from the last checkpointed position
- …instead of at the earliest available record or “now”
- In general, the goal is to use Kinesis to drive useful processing — usually reprocessing duplicate records is not useful (and just costs you money, paid to AWS)
- Checkpointing often means less time and money wasted reprocessing duplicate records.
- You can checkpoint on a time-basis (every X seconds), record-basis (every Y records), every batch, never, or whatever you want — it all depends on how much waste you can tolerate in the event of a failure.