Slideshare hasn't imported my notes, so here's the link to the Google Presentation: https://goo.gl/Gl4Vhm
Haskell is a statically typed, non strict, pure functional programming language. It is often talked and blogged about, but rarely used commercially. This talk starts with a brief overview of the language, then explains how Haskell is evaluated and how it deals with non-determinism and side effects using only pure functions. The suitability of Haskell for real world data science is then discussed, along with some examples of its users, a small Haskell-powered visualization, and an overview of useful packages for data science. Finally, Accelerate is introduced, an embedded DSL for array computations on the GPU, and an ongoing attempt to use it as the basis for a deep learning package.
This presentation briefs about the Open computer vision based image processing. This also provides the information about image, video reading writing and displaying. This presentation provides information about image basic parameters, image representation, playing with pixels, Image Color Space, Changing color spaces and operation over images. This presents the information about the Image Transformation techniques, Image Thresholding techniques, Image smoothening techniques, Image gradients and Canny Edge detection algorithms.
This one is from image processing where i have explained how erosion and dilation works well i dint explained in detail but it will be helpful to understand what erosion and dilation are.
Slides from my talk at Facebook Developer Circle: Pune Launch on Aug 19 2017. Supplementary code available here: https://github.com/mayurbhangale/pytorch_notebook
Special Edition with Dr. Robin Bloor
Live Webcast September 9, 2015
Watch the Archive: https://bloorgroup.webex.com/bloorgroup/onstage/g.php?MTID=e8b9ac35d8e4ffa3452562c1d4286a975
Do the math: algebra will transform information management. Just as the relational database revolutionized the information landscape, so will a just-released, complete algebra of data overhaul the industry itself. So says Dr. Robin Bloor in his new book, the Algebra of Data, which he’ll outline in this special one-hour webcast.
Once organizations learn how to express their data sets algebraically, the benefits will be significant and far-reaching. Data quality problems will slowly subside; queries will run orders of magnitude faster; integration challenges will fade; and countless tedious jobs in the data management space will bid their farewell. But first, software companies must evolve, and that will take time.
Visit InsideAnalysis.com for more information.
This presentation briefs about the Open computer vision based image processing. This also provides the information about image, video reading writing and displaying. This presentation provides information about image basic parameters, image representation, playing with pixels, Image Color Space, Changing color spaces and operation over images. This presents the information about the Image Transformation techniques, Image Thresholding techniques, Image smoothening techniques, Image gradients and Canny Edge detection algorithms.
This one is from image processing where i have explained how erosion and dilation works well i dint explained in detail but it will be helpful to understand what erosion and dilation are.
Slides from my talk at Facebook Developer Circle: Pune Launch on Aug 19 2017. Supplementary code available here: https://github.com/mayurbhangale/pytorch_notebook
Special Edition with Dr. Robin Bloor
Live Webcast September 9, 2015
Watch the Archive: https://bloorgroup.webex.com/bloorgroup/onstage/g.php?MTID=e8b9ac35d8e4ffa3452562c1d4286a975
Do the math: algebra will transform information management. Just as the relational database revolutionized the information landscape, so will a just-released, complete algebra of data overhaul the industry itself. So says Dr. Robin Bloor in his new book, the Algebra of Data, which he’ll outline in this special one-hour webcast.
Once organizations learn how to express their data sets algebraically, the benefits will be significant and far-reaching. Data quality problems will slowly subside; queries will run orders of magnitude faster; integration challenges will fade; and countless tedious jobs in the data management space will bid their farewell. But first, software companies must evolve, and that will take time.
Visit InsideAnalysis.com for more information.
Functional Programming and Haskell - TWBR Away Day 2011Adriano Bonat
Explains why functional programming is back and shows some features from Haskell that are being ported to other languages. Presented at ThoughtWorks Brazil Away Day 2011.
Introduction to Functional Programming with Haskell and JavaScriptWill Kurt
Presentation give to NNSDG on 8/26/2010
Functional programming is often seen as either difficult and academic. Languages such as Haskell, while incredible powerful, don't do much to dismiss this claim. However anyone that's used JavaScript has written a lambda function, and a pretty impressive amount of purely functional code can be implemented in JavaScript. Using these to languages this presentation introduces the fundamentals of functional programming.
Beginning Haskell, Dive In, Its Not That Scary!priort
Haskell can get a bit of a reputation for being this lofty, academic, difficult to learn language. This talk aims to dispel this myth and offer an introduction to this beautiful and pragmatic language. From the point of view of someone who has been functional programming in Scala and Clojure for a while now, but who has, more recently been taking a dive into Haskell, this talk will give a basic introduction to Haskell. Hopefully it will encourage anyone who hasn't tried functional programming in Haskell to dive in too and give it a go.
The talk will be a whistle stop tour of some functional programming fundamentals in Haskell from basic data structures, logic constructs, functional transformations, recursion to some of the basics of Haskell's type system with data declarations and type classes.
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDBCody Ray
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB
Many startups collect and display stats and other time-series data for their users. A supposedly-simple NoSQL option such as MongoDB is often chosen to get started... which soon becomes 50 distributed replica sets as volume increases. This talk describes how we designed a scalable distributed stats infrastructure from the ground up. KairosDB, a rewrite of OpenTSDB built on top of Cassandra, provides a solid foundation for storing time-series data. Unfortunately, though, it has some limitations: millisecond time granularity and lack of atomic upsert operations which make counting (critical to any stats infrastructure) a challenge. Additionally, running KairosDB atop Cassandra inside AWS brings its own set of challenges, such as managing Cassandra seeds and AWS security groups as you grow or shrink your Cassandra ring. In this deep-dive talk, we explore how we've used a mix of open-source and in-house tools to tackle these challenges and build a robust, scalable, distributed stats infrastructure.
NoSQL - MongoDB. Agility, scalability, performance. I am going to talk about the basis of NoSQL and MongoDB. Why some projects requires RDBMs and another NoSQL databases? What are the pros and cons to use NoSQL vs. SQL? How data are stored and transefed in MongoDB? What query language is used? How MongoDB supports high availability and automatic failover with the help of the replication? What is sharding and how it helps to support scalability?. The newest level of the concurrency - collection-level and document-level.
The Next Mainstream Programming Language: A Game Developer’s Perspectiveguest4fd7a2
Tim Sweeney\'s talk at the Symposium on Principles of Programming Languages 2006. Tim is the founder of Epic Games and the lead architect of the Unreal series of engines
Deep learning continues to push the state of the art in domains such as computer vision, natural language understanding and recommendation engines. One of the key reasons for this progress is the availability of highly flexible and developer friendly deep learning frameworks. During this workshop, we will provide a short background on Deep Learning focusing on relevant application domains and an introduction to the powerful and scalable Deep Learning framework, Apache MXNet. At the end of this tutorial you’ll be able to train your own deep neural network, fine tune existing state of the art models for image and object recognition. We’ll also deep dive on setting up your deep learning infrastructure on AWS and model deployment on AWS Lambda.
These are the outline slides that I used for the Pune Clojure Course.
The slides may not be much useful standalone, but I have uploaded them for reference.
These days fast code needs to operate in harmony with its environment. At the deepest level this means working well with hardware: RAM, disks and SSDs. A unifying theme is treating memory access patterns in a uniform and predictable way that is sympathetic to the underlying hardware. For example writing to and reading from RAM and Hard Disks can be significantly sped up by operating sequentially on the device, rather than randomly accessing the data. In this talk we’ll cover why access patterns are important, what kind of speed gain you can get and how you can write simple high level code which works well with these kind of patterns.
Fuel Up JavaScript with Functional ProgrammingShine Xavier
JavaScript is the lingua franca of web development for over a decade. It has evolved tremendously along with the Web and has entrenched in modern browsers, complex Web applications, mobile development, server-side programming, and in emerging platforms like the Internet of Things.
Eventhough JavaScript has come a long way, a reinforced makeover to it will help build concurrent and massive systems that handle Big Data, IoT peripherals and many other complex eco systems. Functional Programming is the programming paradigm that could empower JavaScript to to enable more effective, robust, and flexible software development.
These days, Functional Programming is at the heart of every new generation programming technologies. The inclusion of Functional Programming in JavaScript will lead to advanced and futuristic systems.
The need of the hour is to unwrap the underlying concepts and its implementation in the software development process.
The 46th edition of FAYA:80 provides a unique opportunity for the JavaScript developers and technology enthusiasts to shed light on the functional programming paradigm and on writing efficient functional code in JavaScript.
Join us for the session to know more.
Topics Covered:
· Functional Programming Core Concepts
· Function Compositions & Pipelines
· Use of JS in Functional Programming
· Techniques for Functional Coding in JS
· Live Demo
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiDatabricks
Apache Spark provides an elegant API for developing machine learning pipelines that can be deployed seamlessly in production. However, one of the most intriguing and performant family of algorithms – deep learning – remains difficult for many groups to deploy in production, both because of the need for tremendous compute resources and also because of the inherent difficulty in tuning and configuring.
In this session, you’ll discover how to deploy the Microsoft Cognitive Toolkit (CNTK) inside of Spark clusters on the Azure cloud platform. Learn about the key considerations for administering GPU-enabled Spark clusters, configuring such workloads for maximum performance, and techniques for distributed hyperparameter optimization. You’ll also see a real-world example of training distributed deep learning learning algorithms for speech recognition and natural language processing.Microsoft Cognitive Toolkit (CNTK) inside of Spark clusters on the Azure cloud platform. We’ll discuss the key considerations for administering GPU-enabled Spark clusters, configuring such workloads for maximum performance, and techniques for distributed hyperparameter optimization. We’ll illustrate a real-world example of training distributed deep learning learning algorithms for speech recognition and natural language processing.
Monads and Monoids: from daily java to Big Data analytics in Scala
Finally, after two decades of evolution, Java 8 made a step towards functional programming. What can Java learn from other mature functional languages? How to leverage obscure mathematical abstractions such as Monad or Monoid in practice? Usually people find it scary and difficult to understand. Oleksiy will explain these concepts in simple words to give a feeling of powerful tool applicable in many domains, from daily Java and Scala routines to Big Data analytics with Storm or Hadoop.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
22. Pure functions
● Output determined only by inputs
● No side effects
=> Result independent of evaluation strategy
Impure functions
● Randomness
● File IO
● Network
● Call impure functions
● Mutations
● Hard to reason about
● Requires reasoning
23. Monads
Ordinary value
cube :: (Floating a) => a →
a
cube x = x * x * x
Just use the value Monad
cubeM :: (Monad m, Floating a) => m a → m a
cubeM mx = mx >>= (x → return x * x * x)
Just use the value (inside a function you’ve
bound to the monad using >>=)
24. Various Monad >>= implementations
IO monad
After the IO is performed
Maybe monad
If the value is not Nothing
Reader, Writer, State monad
Immediately
List monad
For each element
cubeM :: (Monad m, Floating a) => m a → m a
cubeM mx = mx >>= (x → return x * x * x)
25. class Monad m where
(>>=) :: m a -> (a -> m b) -> m b
return :: a -> m a
-- ...
Monads
● (In general) No way to extract value
● Result of >>= is m b, so no escape from m!
● Monads can function as tags in your source code
26. class Monad m where
(>>=) :: m a -> (a -> m b) -> m b
return :: a -> m a
-- ...
Monads
● Return representation of side effects
● Control evaluation order
● Move non-determinism away from pure code
● Tag values resulting from impure computation
● Store information between computations
27. Syntactic sugar: Imperative syntax
● Each line evaluated inside a
function passed to >>=
● Evaluation order of lines
guaranteed
● answer is the name bound to
an argument of one of these
functions. It is available to
functions defined inside this
function.
30. Libraries required for data science
Fast Vectors, Arrays, Linear Algebra
Machine learning, Deep learning
Probability and statistics
Big data
Plotting, Graphs, Visualization
31. Vectors, Arrays, Linear Algebra
Vec, Linear, Repa, Accelerate
Use type level literals to encode dimensions of arrays (Repa, Accelerate)
Use type level literals to encode length of vectors (Linear, Vec)
Accelerate EDSL for running computations on the GPU!
Compatability - Use data types from Linear on Accelerate backends
34. Big Data
No spark library unfortunately
Hadron
Misc Hadoop libraries
Haskell-HBase, ElasticSearch, Cassandra, MongoDB, Redis
CloudHaskell
Kafka, ZeroMQ
Various DB connectors
39. Example: Density of OpenStreetMap points
Raw OSM points. 78Gb uncompressed. 2.9 Billion points.
Plot the density of these on a globe.
Use Triangular binning because it might look cool
40. Data types
Point data type. Just use the Vec library
Triangle data type. Tuple of points
A point that stores extra info, for insertion into a KD Tree
55. Many types of space leak
http://blog.ezyang.com/2011/05/space-leak-zoo/
Enough to need their own zoo!
Memory leak
Strong reference leak
Thunk leak
Live variable leak
Streaming leak
Stack Overflow
Selector Leak
Optimization induced Leak
Thread leak
60. Data.IntMap
Strict or Lazy variety? Persistent or Ephemeral?
“The implementation is based on big-endian patricia trees. This data structure performs especially well on binary operations like union and
intersection. However, my benchmarks show that it is also (much) faster on insertions and deletions when compared to a generic size-balanced map
implementation (see Data.Map).
● Chris Okasaki and Andy Gill, "Fast Mergeable Integer Maps", Workshop on ML, September 1998, pages 77-86, http://citeseer.ist.psu.
edu/okasaki98fast.html
○ D.R. Morrison, "/PATRICIA -- Practical Algorithm To Retrieve Information Coded In Alphanumeric/", Journal of the ACM,
15(4), October 1968, pages 514-534.
“”
61. Data.IntMap is a persistent data structure!
Result => Horrendous space leak!
Fix by periodically rebuilding it.
Or, give in and use a mutable vector.
Every value is immutable, every function result is deterministic, and without side-effects.
Usually compiled to native machine code, mainly by GHC. Can also be compiled to javascript or interpreted.
Statically typed. This gives the language a great deal of safety.
No implicit conversion between values. Annoying for beginners
In a strict language, functions require their arguments to be evaluated. In Haskell, all values are lazy by default, but explicit strictness is allowed.
The Haskell wiki hints that there is a huge amount of proprietary, closed source financial code written in Haskell.
Facebook have been doing a lot of functional programming, for instance React.js, which now has several Haskell implementations or wrappers. However, from what I’ve read, the main area they are using Haskell seems to be their data science team.
BAE have used it.
There are nearly 80 packages in the package index under the section Bioinformatics, more than most other sections.
Oddly enough, Haskell is so underused that it’s easy to find top developers to work on your startup.
Google, Microsoft, Intel, and NVidia have all used Haskell.
Proof that it’s possible to throw something together quickly in Haskell
What follows is a whirlwind tour of Haskell, skimming through the most important bits, and ignoring the rest of it.
Putting expressions next to eachother applies the left one to the right one. In the case of multiple arguments, Haskell is left associative.
This shows the type of the addition operator. The double colon denotes a type signature. Num a and anything to the left of a => is a type constraint. These allow us to be flexible about the types in our program, without compromising type safety
You don’t have to apply a function to all of its arguments. Here, we’ve partially applied addition to the number 5, and the result was a function that accepts the remaining arguments.
Higher order functions are functions that operate on or return other functions. Therefore all Haskell functions with arity > 1 are higher order functions. Map is a higher order function because it accepts a function as an argument. [a] denotes a list of type a. Here, we double all elements of a list by partially applying the multiplication operator to the number 2, then using map.
Just wanted to mention the lambda syntax. The backslash is supposed to look like a lambda.
This is equivalent to “class” in java/python or “defrecord” in clojure.
ChessGame is the type, and NotStarted, PlayerTurn, and CheckMate are its constructors.
A more convenient syntax for defining data types
firstName, lastName, and personID are automatically declared as accessor functions
I’ve also introduced a type variable s here. The type of Personrecord s depends on the type s. This might be useful because there are many different ways of representing a string in Haskell.
_ = don’t care
Pattern matching is useful for branching on different constructors and values.
Pattern matching is useful for extracting fields for use in the function body
Pattern matching is powerful. It works on lists and tuples
Useful in many other places in Haskell
Lists are just ordinary data definitions. They can be constructed by making an empty list, or by consing an element onto a list of similar elements. (:) is just a constructor.
Recursive definition of map using pattern matching.
At runtime, when map is applied, xs need not have been evaluated yet. This means that xs can be an infinite list! The recursion only stops when the caller stops evaluating the results that map returns.
Since the result returned is in Weak Head Normal Form, f, and the recursive part of map are left unevaluated.
Rarely have to define functions this way in real life.
In most languages, this would be inefficient, and cause a stack overflow for lists of a certain size. In Haskell, unevaluated bindings are represented with thunks, which are lightweight and are stored on the heap.
Suppose we define a data type to represent an AST for JSON….
Typeclasses allow you to write type constraints.
The purpose of them is polymorphism over types
A typeclass is a set of types for which certain functions have been defined.
The ‘instance’ declaration makes a type a member of a typeclass, and allows you to define those functions
Types can be members of multiple typeclasses
Typeclasses are open!
Much more flexible than inheritance!
Expressions can be represented as graphs, where each node is a value, or an unevaluated thunk.
The next step is the application of graph reduction rules by the STG machine, which are based on the Lambda Calculus. The STG machine produces assembly language which carries out the operations required by these reductions. STG stands for Spineless Tagless G-machine, where G stands for graph. It lives in GHC and converts the lowest level of Haskell to Assembly language.
Sharing is applied to avoid recomputing the same values
The graph is reduced. Function application is a type of reduction. It’s more complicated than this in real life.
Expressions can be represented as graphs, where each node is a value, or an unevaluated thunk.
The next step is the application of graph reduction rules
If the top level of a graph is a constructor, the graph is said to be in Weak Head Normal Form. This is used as the return value, and the lower levels are evaluated lazily, as needed. This is how laziness is implemented in Haskell
This symbol is called Bottom. Bottom is an expression that can’t be evaluated, due to an infinite loop or an error. Bottom is a member of every type.
This is pronounced “bind”
In this slide I’m comparing monads to ordinary plain values. Monad is a typeclass whose instances define certain functions including >>= and return. The purpose of >>= is to allow us to provide a function that uses the plain value, and allow the Monad implementation to control how that function is called. We choose how to use the value inside the function, but the Monad’s implementation chooses how and when to call our function.
Different Monad instances implement >>= differently, and will call your function in different ways.
Can’t extract plain value in general. No way of using a and returning something of any other form than m b, in general.
If you return a representation of your side effects, they are no longer really side-effects, they are the main return value of your function
getLine reads a line from STDIN. The value it returns does not directly contain the value it has read, so it can be considered pure.
Furthermore, chained, use of >>= constructs a chain of dependent computations. This guarantees the evaluation order.
GHCI: Evaluate expressions, determine their types, inspect modules
Hoogle: Search for functions by their type signature
Cabal ~ a bit like make + package installer
Hackage is a package DB. Like most language package databases, contains a large amount of unmaintained, disused, abandonware. Unlike most language-specific package databases, this abandonware has a good chance of still working!
Fay, Haste, and GHCJS are Haskell to Javascript compilers.
Neural nets, Markov models, SVMs, Hopfield network, Restricted Boltzmann Machines, a Convnet, genetic algorithms, dynamic time warping, Many different clustering algorithms, and Kalman filters.
Unfortunately, none of the neural networks are capable of using the GPU yet, but one of my side projects is to build a deep learning library on top of Accelerate. The author of deeplearning-hs works for Facebook AI research and has experimented with Accelerate. dnngraph can generate models for Caffe and Torch.
Many different packages for probability and statistics. You can also call R from Haskell
No spark library unfortunately.
Various libraries for interacting with Hadoop, but only two libraries for running Haskell on Hadoop
Hadron. Hadron uses MapReduce streaming, and conduits. It requires Haskell to be installed on every node.
Cloud Haskell is a distributed concurrency framework.
There are also various connectors to the usual suspects like MySQL and PostgreSQL.
Haskell has many graph plotting libraries.
OpenGL and Haskell are an odd combination, but there are bindings, and they use Monads to represent the internal OpenGL state
Haskell is very good for writing declarative DSLs. There are libraries for writing HTML and CSS using the do notation. The result is checked, and it looks very much like HAML or SASS.
This generates a scatter plot and outputs it to a file. You could easily do this from GHCI
Here’s some other the other plots this library is capable of. There are many other plotting libraries.
We’ll generate a spherical mesh using recursion
Octahedron. This is supposed to give you fairly even triangles. Turns out that if you start with a Tetrahedron, then after the first refinement on a given face, the middle triangle is twice the area of each of the other triangles. Also, Octahedrons are really easy to hardcode.
The do notation uses the Maybe Monad. If a line returns Nothing, then the rest of the block is skipped. We’ll find the nearest neighbor and inspec the triangles it’s part of, binning the point into the nearest one.
We’ll take a break now to look at some hacks and some problems we come across in Haskell
Here are some functions and types that are considered harmful, but have their uses. Please use them carefully.
undefined is bottom and if your program tries to evaluate it, it will crash at runtime. However, it can inhabit any type. This is useful for making dummy values to solve type errors in GHCI.
unsafePerformIO extracts a value from an IO monad by performing the IO. Using this function can introduce impurity into pure functions, resulting in undefined behaviour
IORef and MVar are mutable variables. Excessive use of these defeats the point of functional programming.
unsafeCoerce changes the type of a value without changing the value. This can cause segfaults.
trace wraps a unary function, printing out a string when it is evaluated
Haskell has quite a steep learning curve, because it has too much confusing jargon and type system complications.
Monad tutorial fallacy. People imagine that there’s a single explanation they helped them grasp a concept, whereas this explanation was actually the last one they read.
Despite ensuring type correctness and guarding against race conditions with immutability, run time errors are still possible.
You can cause runtime errors by incorrectly using the FFI, or by not covering all cases in a pattern match, or simply by throwing an exception
If you don’t catch all cases in a pattern match, your program might bork at runtime. Annoyingly, this could be prevented at compile time, but only by setting a compiler flag
Haskell has Exceptions
error is sugar for undefined. It’s designed to represent a programming error, and bork with a message. Unfortunately it’s often used, so at some point you might need to catch it. Exceptions, however, are just data types, where the infrastructure required to throw and catch them is provided by a library.
fail is a method which must be implemented by all monads. In the IO monad in Haskell 98, fail calls error, but the Maybe monad has a sensible implementation. fail’s presence in the language is generally regarded as a bad design decision and should be avoided.
There are many abstractions for dealing with different errors and exceptions, and many differen monads. The monad transformer errorT is a good way of wrapping a monad with error and exception handling.
Maybe and Either are elegant ways of handling computations that fail. They carry a result, or alternatively, some failure information.
There are too many disparate ways of expressing and handling errors in Haskell.
If you try and divide by 0, you get infinity rather than an error. Haskell could be safer if division returned a Maybe.
If you really wanted to, you could redefine division to explicitly handle division by 0, and use -Wall to make you handle it at compile time.
Space leaks are a problem. It’s very easy to consume crazy amounts of memory in Haskell.
There are so many different types of space leak that they need their own zoo. You’ll probably come across that page at some point.
Suppose we want to find the length of a lazy list. The elegant but naive implementation on the left is slow, leaks memory, and overflows the stack.
The length of xs will be evaluated first, then the addition of 1. This causes traversal to the end of the list, building up thunks. When the end of the list is reached, the thunks must be evaluated in turn, then they can be freed. This is a lot of work just to perform simple addition.
In the implementation on the right, we recurse last rather than first. len’ is tail recursive. This means that the recursive call is the last (or outermost) computation. This can be optimised into iteration which does not build a chain of thunks.
Haven’t bothered with benchmarks, in case they might not be meaningful.
For a given task, Haskell can be slower, because of unnecessary copying that might occur. However, due to laziness, it can skip unnecessary computation, so it’s possible that a Haskell implementation could be faster than a C/C++ implementation!
Conduits are a way of piping data around in constant memory. Sources are conduits that only produce and Sinks are conduits that only consume.
await consumes a value from the input, and yield produces a value.
The conduit recurses on itself
await consumes
yield produces
Recurse
Turns out this leaks memory like a seive, then crashes, unless you have vast amounts of RAM, because of a bad choice of data structure
Big Endian Patricia Trees are persistent data structures, so IntMap stores it’s entire state history!
The moral of the story is, it’s sometimes OK to use mutable data structures in Haskell.
Success! Triangular map of the density of heatmap points in OSM.
This is how to multiply two matrices in Accelerate, on the GPU. It works by replicating the matrices into 3d arrays, and transposing one of them, so that matrix multiplication is an elementwise product and a summation. In practice, GPU cycles are not wasted replicating the matrices, thanks to fusion that takes place in Accelerate.
One of my side projects is to implement symbolic differentiation for Accelerate, so that it’s easy to implement Deep learning, and you don’t have to spend time differentiating by hand when you want to add LSTMs or GRUs to your network.