Posts

Streaming analytics in Scala using ZIO and Apache DataSketches (part 1)

Image
First let's look at a trivial example of streaming analytics using ZIO Streams 2.0.9 (on Scala 2.13). https://scastie.scala-lang.org/6AHfPh02S6m0k7aN6bkacA When run, this Scastie just averages some random numbers (gaussian) in a single input stream.  That code is in the trait AvgFun . But we also see a trait called ParAvgFun with the code for rather painless parallel  implementation, using ZIO chunks, via ZStream's  grouped , zipWithIndex , and groupByKey .  Across these two traits, the clean reusability of ZSinks in different pipelines shows a nice design win for the ZIO Streams API.   The code in  ParAvgFun.parAvgUsingChunks  demonstrates a terse and reasonably efficient design for accumulating results using parallel ZStreams (within a single process).  It is applicable to  simple cases where the stream elements are lightweight and similar, and the underlying operations are uniformly easy (like summing numbers).  When instead there is great variability in the cost of proces

AxLam Example: Questions about Triangles

AxLam Example : Questions about Triangles Consider the humble, mighty triangle:  a connected shape with 3 vertices, 3 sides, and 3 angles.  We will experiment with AxLam models for describing and computing with triangles individually and in groups, at several levels of mathematical rigor.  On this road we touch briefly on some fundamental aspects of mathematics and information, while staying intuitively grounded in the triangle simplex.   In a computer-assisted modeling paradigm, it is natural to proceed constructively. An intuitive way to start modeling triangles is using the lengths of the 3 sides of the triangle. In any programming or data language, we might quickly write down an ADT, record or class containing  three numeric sides, looking something like: class  TriangleSidesRecord (sideA : Number, sideB : Number, sideC : Number) This structure is sufficient to begin computing with Triangles.  However, when we examine the semantics of this simple record structure, we quickly find m

Overview of AxLam model : Nouns, Types, Morphisms

2023-Feb-03 picking up from tweet thread of 220909  AxLam model: Noun (space), Type (paint), Morphism (arrow) AxLam is for writing models using [concepts, types, and morphisms].  Our concepts may be in the form of some data, or less specifically as things we choose to name or refer to in a model.  Over this concept space, we use types and morphisms to define an adjustable descriptive model that directs and regulates processing of the concept data (or other manifestation).  Alternatively and  more approachably we may write in terms of [nouns,  formal classes, and mappings], and this framing is mostly equivalent for informal purposes. Nouns / Concepts / Terms / Data Nouns (i.e. concepts) describe anything we wish to refer to, and need not be explicit or named.  We think of nouns as the space of things we might try to describe, name or refer to.  "Terms" in a type-theoretic sense are nouns.  Real things in the human world that someone cares about are nouns.  In practice when we

Short CS Reading List for 2022

  Short CS Reading List for 2022  Here are a few useful recent computer science papers and books available as PDF downloads. Machine Learning and Big Data Some practical computation topics of current interest: " Category Theory in Machine Learning " by Shiebler, Gavranovic, Wilson 43 pages, on arXiv uploaded 2021-06-13 arxiv.org/abs/2106.07032 Good context here for thinking about Probability Monads (e.g. Giry).   See page 22. " Zero-Cost, Arrow-Enabled Data Interface for Apache Spark " by Rodriguez, Chackrabroty, Chu et al. 6 pages, first uploaded to arXiv on 2021-06-24, updated on 2021-11-27 arxiv.org/abs/2106.13020 "Dive into Deep Learning" interactive book d2l.ai 1,011 page(!) compendium of ML approaches, useful for review with attempted consistent vocabulary. Usable as HTML, intended to be agnostic about ML platform, comes in several editions.  Stu B. uses MXNet-variant pdf download.

recent work in math knowledge rep 2021Q3

  2021-Q3 survey of recent work in math knowledge rep Background definitions and motives FAIR Data : ( wikipedia )  data which meet principles of findability , accessibility , interoperability , and reusability . go-fair  principles in more depth Some surveys  (Deep) FAIR Mathematics - (KWARC) 2020-02-25 by Katja Berčič, Michael Kohlhase, Florian Rabe  " Big Math and the One-Brain Barrier : The Tetrapod Model of Mathematical Knowledge" by Jacques Carette, William M. Farmer, Michael Kohlhase, Florian Rabe  Expanded group from McMaster + KWARC shared working notes and tables as of 2020-Feb in: " The Space of Mathematical Software Systems -- A Survey of Paradigmatic Systems" A Survey of Languages for Formalizing Mathematics - (KWARC) 2020  by Cezary Kaliszyk and Florian Rabe Some proceedings Accepted Papers at ITP2021 -- 12th International Conference on Interactive Theorem Proving ITP2021  Proceedings   as a single PDF  CADE 2021 Proceedings (open access at Spring

AxioMagic under Spark on cloud (AWS, Azure, ...)

AxioMagic under Spark on ${ cloud  } AxioMagic relies on functional composition over immutable inputs, thus matching modern evaluation frameworks like Apache Spark, which defines a structured approach to managing data contexts of functional evaluation over data frames.  Spark embodies a refinement of both  ETL and MapReduce methodology.  State machines like Spark require computation and storage resources which may come from any cluster, but today are commonly provided from cloud environments, yielding high volume functional computation at predictable usage-based costs.  We especially value Spark for open scientific dataset analysis, where it is common to work with large data files in interchange formats. Essential features of Spark distributed evaluation Spark assembles batch pipelines to pump datasets of structured records through functional tasks (which may  invoke side-effects, with all necessary stipulations).  This pipeline serves to evaluate a given distributed functional algorit

Domain types based on probability monads

Image
(Important:  I did not create the image shown in blogger preview - it is from NCat Lab website!) Composed stochastic proof for low-dimensional applications  Following up on previous post on Composed Stochastic Proof , we now enumerate: What the typed world owes the authored world Compile-time access to categorical knowledge about probability monads  (overview page at NCat Lab site) over our dependent type system for measure theory  (at MathLib/Lean site).   To knit this knowledge into useful, portable applications, we proceed with four major tasks: Task 1) Starting from a shared library conceptual math knowledge, we identify applicable categories that are useful in our application model.  The NCatLab website is a great current resource on existing work.  The page on monads of probability, measures, and valuations contains this  summary table of the most useful, widely known functional probability concepts. These are the core ingredient math concepts needed for robust, general proce