« Kinesis » : différence entre les versions

De Banane Atomic
Aller à la navigationAller à la recherche
Ligne 87 : Ligne 87 :


== Destination ==
== Destination ==
* S3, [https://docs.aws.amazon.com/firehose/latest/dev/s3-prefixes.html Custom Prefixes for Amazon S3 Objects]
* S3, [https://docs.aws.amazon.com/firehose/latest/dev/s3-prefixes.html Custom Prefixes for Amazon S3 Objects], [https://aws.amazon.com/blogs/big-data/amazon-data-firehose-custom-prefixes-for-amazon-s3-objects/ Amazon Data Firehose custom prefixes for Amazon S3 objects]
* MongoDB
* MongoDB



Version du 22 mai 2024 à 14:26

Description

Allows to gather data from different sources using data stream or fire hoses, and get that data to a destination in the expected format.
Allows to collect, process and analyze data in near real-time at scale.
Kinesis data streams are used in places where an unbounded stream of data needs to worked on in real time.
And Kinesis Firehose delivery streams are used when data needs to be delivered to a storage destination, such as S3.

Common use cases

  • pattern detection
  • click stream analysis (who is clicking on what and when)
  • log processing for machine learning
  • anomyly detection in IoT devices

Benefits

  • fully managed service (focus on data and don't worry about the underlying system)
  • can handle large amount of data
  • can consume process and buffer data in real-time
  • allows production of real-time metrics and reporting

Data Stream

Move data from sources to a destination while having analytics, monitoring alerts, and connections to other services.

Common use cases

  • log and data intake and procesing
  • real-time metrics, reporting, and data analytics

Benefits

  • ensures data durability and elasticity

Producers

Producers put records into Kinesis Data Streams.

  • EventBridge
  • CloudFront, collect views and analyze real-time metrics of CDN
  • CloudWatch
Cs.svg
var kinesisDataStreamName = "input-stream";
var region = RegionEndpoint.USEast1;
var kinesisConfiguration = new AmazonKinesisConfig
{
    Profile = new Profile("..."),
    RegionEndpoint = region
};
var kinesisClient = new AmazonKinesisClient(kinesisConfiguration);

for (var i = 1; ; i++)
{
    var message = new Message(i, DateTime.Now);
    var data = JsonConvert.SerializeObject(message);

    using var memoryStream = new MemoryStream(Encoding.UTF8.GetBytes(data));
    var request = new PutRecordRequest
    {
        StreamName = kinesisDataStreamName,
        Data = memoryStream,
        PartitionKey = "partitions1"
    };

    var response = kinesisClient.PutRecordAsync(request).Result;
    Console.WriteLine($"Shard ID: {response.ShardId}, Sequence Number: {response.SequenceNumber}, HTTPStatusCode: {response.HttpStatusCode}");

    await Task.Delay(1000); // every 1 second
}

class Message(int id, DateTime date)
{
    public int Id { get; set; } = id;
    public DateTime Date { get; set; } = date;
}

Consumers

Consumers get records from Kinesis Data Streams and process them.

  • Data Firehose
  • Lambda, read data by batch
  • Glue, ETL
  • SNS

Amazon Data Firehose

  • ingest data from different sources
  • transform source records using AWS lambda functions
  • deliver data to a specified destination

Similar to Data Stream but acts as an ETL.

Source

  • Kinesis Data Streams

Destination

Data Analytics

  • Studio Notebook
  • Glue catalog