The Trim Horizon Of AWS Kinesis And The Concept Of Checking

The consumer has checkpointed the shard at TRIM_HORIZON but hasn't started processing it

1. SHARDS, STREAMS, AND ITERATORS

  • TRIM_HORIZON is the type of shard iterator
  • shard is a unit of volume (data sequencing) within the stream
  • shard iterator is what defines the position to start consuming data records sequentially

2. SHARD ITERATOR

  • these points can be of various types

3. CHECKPOINTING

  • checkpointing is essentially a mechanism allowing you to restart stream processing from the last checkpointed position
  • …instead of at the earliest available record or “now”
  • In general, the goal is to use Kinesis to drive useful processing — usually reprocessing duplicate records is not useful (and just costs you money, paid to AWS)
  • Checkpointing often means less time and money wasted reprocessing duplicate records.
  • You can checkpoint on a time-basis (every X seconds), record-basis (every Y records), every batch, never, or whatever you want — it all depends on how much waste you can tolerate in the event of a failure.

5. sources

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Pavol Kutaj

Pavol Kutaj

Infrastructure Support Engineer/Technical Writer (Snowplow Analytics) with a passion for Python/writing documentation. More about me: https://pavol.kutaj.com