Streaming analytics in Scala using ZIO and Apache DataSketches (part 1)
First let's look at a trivial example of streaming analytics using ZIO Streams 2.0.9 (on Scala 2.13).
When run, this Scastie just averages some random numbers (gaussian) in a single input stream. That code is in the trait AvgFun.
But we also see a trait called ParAvgFun with the code for rather painless parallel implementation, using ZIO chunks, via ZStream's grouped, zipWithIndex, and groupByKey. Across these two traits, the clean reusability of ZSinks in different pipelines shows a nice design win for the ZIO Streams API.
The following two images show a preview of the same code. When run, the Scastie should print a result as shown (the dark background is console output), which reminds us that we are running only the single-streamed average here. I left the parallel job triggering out to avoid overusing the resources of the friendly Scastie service.
In a followup post we will go deeper into streaming analytics, using this ZIO code pattern together with the Apache DataSketches library.