This document discusses domain specific languages (DSLs) and provides examples of creating DSLs in Python. It explains that DSLs allow users to work in specialized mini-languages tailored to specific problem domains. As examples, it discusses SQL as a DSL for databases and regular expressions as a DSL for patterns. It then demonstrates how to create an external DSL for defining forms using PyParsing and an internal DSL for the same using Python features like metaclasses. The document concludes that DSLs make code easier to read, write and maintain for non-programmers.
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...Spark Summit
Apache Spark 2.1.0 boosted the performance of Apache Spark SQL due to Project Tungsten software improvements. Another 16x times faster has been achieved by using Oracle’s innovations for Apache Spark SQL. This 16x improvement is made possible by using Oracle’s Software in Silicon accelerator offload technologies.
Apache Spark SQL In-memory performance is becoming more important due to many factors. Users are now performing more advanced SQL processing on multi-terabyte workloads. In addition on-prem and cloud servers are getting larger physical memory to enable storing these huge workloads be stored in memory. In this talk we will look at using Spark SQL in feature creation, feature generation within pipelines for Spark ML.
This presentation will explore workloads at scale and with complex interactions. We also provide best practices and tuning suggestion to support these kinds of workloads on real applications in cloud deployments. In addition ideas for next generation Tungsten project will also be discussed.
Dynamic Partition Pruning in Apache SparkDatabricks
In data analytics frameworks such as Spark it is important to detect and avoid scanning data that is irrelevant to the executed query, an optimization which is known as partition pruning. Dynamic partition pruning occurs when the optimizer is unable to identify at parse time the partitions it has to eliminate. In particular, we consider a star schema which consists of one or multiple fact tables referencing any number of dimension tables. In such join operations, we can prune the partitions the join reads from a fact table by identifying those partitions that result from filtering the dimension tables. In this talk we present a mechanism for performing dynamic partition pruning at runtime by reusing the dimension table broadcast results in hash joins and we show significant improvements for most TPCDS queries.
PySpark Training | PySpark Tutorial for Beginners | Apache Spark with Python ...Edureka!
** PySpark Certification Training: https://www.edureka.co/pyspark-certification-training **
This Edureka tutorial on PySpark Training will help you learn about PySpark API. You will get to know how python can be used with Apache Spark for Big Data Analytics. Edureka's structured training on Pyspark will help you master skills that are required to become a successful Spark Developer using Python and prepare you for the Cloudera Hadoop and Spark Developer Certification Exam (CCA175).
APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Libraryjexp
APOC has become the de-facto standard utility library for Neo4j. In this talk, I will demonstrate some of the lesser known but very useful components of APOC that will save you a lot of work. You will also learn how to combine individual functions into powerful constructs to achieve impressive feats
This will be a fast-paced demo/live-coding talk.
Video: https://neo4j.com/graphconnect-2018/session/neo4j-utility-library-apoc-pearls
Unicorn images by TeeTurtle.com (Unstable Unicorns is a fun game & cool t-shirts)
Apache Spark™ is a fast and general engine for large-scale data processing. Spark is written in Scala and runs on top of JVM, but Python is one of the officially supported languages. But how does it actually work? How can Python communicate with Java / Scala? In this talk, we’ll dive into the PySpark internals and try to understand how to write and test high-performance PySpark applications.
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...Spark Summit
Apache Spark 2.1.0 boosted the performance of Apache Spark SQL due to Project Tungsten software improvements. Another 16x times faster has been achieved by using Oracle’s innovations for Apache Spark SQL. This 16x improvement is made possible by using Oracle’s Software in Silicon accelerator offload technologies.
Apache Spark SQL In-memory performance is becoming more important due to many factors. Users are now performing more advanced SQL processing on multi-terabyte workloads. In addition on-prem and cloud servers are getting larger physical memory to enable storing these huge workloads be stored in memory. In this talk we will look at using Spark SQL in feature creation, feature generation within pipelines for Spark ML.
This presentation will explore workloads at scale and with complex interactions. We also provide best practices and tuning suggestion to support these kinds of workloads on real applications in cloud deployments. In addition ideas for next generation Tungsten project will also be discussed.
Dynamic Partition Pruning in Apache SparkDatabricks
In data analytics frameworks such as Spark it is important to detect and avoid scanning data that is irrelevant to the executed query, an optimization which is known as partition pruning. Dynamic partition pruning occurs when the optimizer is unable to identify at parse time the partitions it has to eliminate. In particular, we consider a star schema which consists of one or multiple fact tables referencing any number of dimension tables. In such join operations, we can prune the partitions the join reads from a fact table by identifying those partitions that result from filtering the dimension tables. In this talk we present a mechanism for performing dynamic partition pruning at runtime by reusing the dimension table broadcast results in hash joins and we show significant improvements for most TPCDS queries.
PySpark Training | PySpark Tutorial for Beginners | Apache Spark with Python ...Edureka!
** PySpark Certification Training: https://www.edureka.co/pyspark-certification-training **
This Edureka tutorial on PySpark Training will help you learn about PySpark API. You will get to know how python can be used with Apache Spark for Big Data Analytics. Edureka's structured training on Pyspark will help you master skills that are required to become a successful Spark Developer using Python and prepare you for the Cloudera Hadoop and Spark Developer Certification Exam (CCA175).
APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Libraryjexp
APOC has become the de-facto standard utility library for Neo4j. In this talk, I will demonstrate some of the lesser known but very useful components of APOC that will save you a lot of work. You will also learn how to combine individual functions into powerful constructs to achieve impressive feats
This will be a fast-paced demo/live-coding talk.
Video: https://neo4j.com/graphconnect-2018/session/neo4j-utility-library-apoc-pearls
Unicorn images by TeeTurtle.com (Unstable Unicorns is a fun game & cool t-shirts)
Apache Spark™ is a fast and general engine for large-scale data processing. Spark is written in Scala and runs on top of JVM, but Python is one of the officially supported languages. But how does it actually work? How can Python communicate with Java / Scala? In this talk, we’ll dive into the PySpark internals and try to understand how to write and test high-performance PySpark applications.
Optimizing Delta/Parquet Data Lakes for Apache SparkDatabricks
This talk outlines data lake design patterns that can yield massive performance gains for all downstream consumers. We will talk about how to optimize Parquet data lakes and the awesome additional features provided by Databricks Delta. * Optimal file sizes in a data lake * File compaction to fix the small file problem * Why Spark hates globbing S3 files * Partitioning data lakes with partitionBy * Parquet predicate pushdown filtering * Limitations of Parquet data lakes (files aren't mutable!) * Mutating Delta lakes * Data skipping with Delta ZORDER indexes
Speaker: Matthew Powers
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricks
As a general computing engine, Spark can process data from various data management/storage systems, including HDFS, Hive, Cassandra and Kafka. For flexibility and high throughput, Spark defines the Data Source API, which is an abstraction of the storage layer. The Data Source API has two requirements.
1) Generality: support reading/writing most data management/storage systems.
2) Flexibility: customize and optimize the read and write paths for different systems based on their capabilities.
Data Source API V2 is one of the most important features coming with Spark 2.3. This talk will dive into the design and implementation of Data Source API V2, with comparison to the Data Source API V1. We also demonstrate how to implement a file-based data source using the Data Source API V2 for showing its generality and flexibility.
High-speed Database Throughput Using Apache Arrow Flight SQLScyllaDB
Flight SQL is a revolutionary new open database protocol designed for modern architectures. Key features in Flight SQL include a columnar-oriented design and native support for parallel processing of data partitions. This talk will go over how these new features can push SQL query throughput beyond existing standards such as ODBC.
Mark and Wes will talk about Cypher optimization techniques based on real queries as well as the theoretical underlying processes. They'll start from the basics of "what not to do", and how to take advantage of indexes, and continue to the subtle ways of ordering MATCH/WHERE/WITH clauses for optimal performance as of the 2.0.0 release.
In a world where compute is paramount, it is all too easy to overlook the importance of storage and IO in the performance and optimization of Spark jobs.
Security is one of fundamental features for enterprise adoption. Specifically, for SQL users, row/column-level access control is important. However, when a cluster is used as a data warehouse accessed by various user groups via different ways, it is difficult to guarantee data governance in a consistent way. In this talk, we focus on SQL users and talk about how to provide row/column-level access controls with common access control rules throughout the whole cluster with various SQL engines, e.g., Apache Spark 2.1, Apache Spark 1.6 and Apache Hive 2.1. If some of rules are changed, all engines are controlled consistently in near real-time. Technically, we enables Spark Thrift Server to work with an identify given by JDBC connection and take advantage of Hive LLAP daemon as a shared and secured processing engine. We demonstrate row-level filtering, column-level filtering and various column maskings in Apache Spark with Apache Ranger. We use Apache Ranger as a single point of security control center.
Intergen's Solution Architect and Microsoft MVP Gavin Barron presented "PowerShell: Automation for everyone" during the MVP CompCamp 2014, a worldwide event ran during the weekend of 22-23 March 2014.
Gavin's blog: http://gavinb.net/
The Parquet Format and Performance Optimization OpportunitiesDatabricks
The Parquet format is one of the most widely used columnar storage formats in the Spark ecosystem. Given that I/O is expensive and that the storage layer is the entry point for any query execution, understanding the intricacies of your storage format is important for optimizing your workloads.
As an introduction, we will provide context around the format, covering the basics of structured data formats and the underlying physical data storage model alternatives (row-wise, columnar and hybrid). Given this context, we will dive deeper into specifics of the Parquet format: representation on disk, physical data organization (row-groups, column-chunks and pages) and encoding schemes. Now equipped with sufficient background knowledge, we will discuss several performance optimization opportunities with respect to the format: dictionary encoding, page compression, predicate pushdown (min/max skipping), dictionary filtering and partitioning schemes. We will learn how to combat the evil that is ‘many small files’, and will discuss the open-source Delta Lake format in relation to this and Parquet in general.
This talk serves both as an approachable refresher on columnar storage as well as a guide on how to leverage the Parquet format for speeding up analytical workloads in Spark using tangible tips and tricks.
Data Quality With or Without Apache Spark and Its EcosystemDatabricks
Few solutions exist in the open-source community either in the form of libraries or complete stand-alone platforms, which can be used to assure a certain data quality, especially when continuous imports happen. Organisations may consider picking up one of the available options – Apache Griffin, Deequ, DDQ and Great Expectations. In this presentation we’ll compare these different open-source products across different dimensions, like maturity, documentation, extensibility, features like data profiling and anomaly detection.
Hyperspace is a recently open-sourced (https://github.com/microsoft/hyperspace) indexing sub-system from Microsoft. The key idea behind Hyperspace is simple: Users specify the indexes they want to build. Hyperspace builds these indexes using Apache Spark, and maintains metadata in its write-ahead log that is stored in the data lake. At runtime, Hyperspace automatically selects the best index to use for a given query without requiring users to rewrite their queries. Since Hyperspace was introduced, one of the most popular asks from the Spark community was indexing support for Delta Lake. In this talk, we present our experiences in designing and implementing Hyperspace support for Delta Lake and how it can be used for accelerating queries over Delta tables. We will cover the necessary foundations behind Delta Lake’s transaction log design and how Hyperspace enables indexing support that seamlessly works with the former’s time travel queries.
Using Spark to Load Oracle Data into CassandraJim Hatcher
This presentation describes how you can use Spark as an ETL tool to get data from a relational database into Cassandra. I go through the concept in general and then talk about some specific issues you might run into and how to fix them.
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersDataWorks Summit
Apache Knox Gateway is a proxy for interacting with Apache Hadoop clusters in a secure way providing authentication, service level authorization, and many other extensions to secure any HTTP interactions in your cluster. One main feature of Apache Knox Gateway is the ability to extend the reach of your REST APIs to the internet while still securing your cluster and working with Kerberos. Recent contributions to the Apache Knox community have added support for Single Sign On (SSO) based on Pac4j 1.8.9 which is a very powerful security engine which provides SSO support through SAML2, OAuth, OpenID, and CAS. In addition, through recent community contributions Apache Ambari, and Apache Ranger can now also provide SSO authentication through Knox. This paper will discuss the architecture of Knox SSO, it will explain how enterprise user could benefit by this feature and will present enterprise use cases for Knox SSO, and integration with open source Shibboleth, ADFS Windows server Idp support, and Okta cloud Idp.
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...StreamNative
Milvus is an open-source vector database that leverages a novel data fabric to build and manage vector similarity search applications. As the world's most popular vector database, it has already been adopted in production by thousands of companies around the world, including Lucidworks, Shutterstock, and Cloudinary. With the launch of Milvus 2.0, the community aims to introduce a cloud-native, highly scalable and extendable vector similarity solution, and the key design concept is log as data.
Milvus relies on Pulsar as the log pub/sub system. Pulsar helps Milvus to reduce system complexity by loosely decoupling each micro service, making the system stateless by disaggregating log storage and computation, which also makes the system further extendable. We will introduce the overview design, the implementation details of Milvus and its roadmap in this topic.
Takeaways:
1) Get a general idea about what is a vector database and its real-world use cases.
2) Understand the major design principles of Milvus 2.0.
3) Learn how to build a complex system with the help of a modern log system like Pulsar.
Optimizing Delta/Parquet Data Lakes for Apache SparkDatabricks
This talk outlines data lake design patterns that can yield massive performance gains for all downstream consumers. We will talk about how to optimize Parquet data lakes and the awesome additional features provided by Databricks Delta. * Optimal file sizes in a data lake * File compaction to fix the small file problem * Why Spark hates globbing S3 files * Partitioning data lakes with partitionBy * Parquet predicate pushdown filtering * Limitations of Parquet data lakes (files aren't mutable!) * Mutating Delta lakes * Data skipping with Delta ZORDER indexes
Speaker: Matthew Powers
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricks
As a general computing engine, Spark can process data from various data management/storage systems, including HDFS, Hive, Cassandra and Kafka. For flexibility and high throughput, Spark defines the Data Source API, which is an abstraction of the storage layer. The Data Source API has two requirements.
1) Generality: support reading/writing most data management/storage systems.
2) Flexibility: customize and optimize the read and write paths for different systems based on their capabilities.
Data Source API V2 is one of the most important features coming with Spark 2.3. This talk will dive into the design and implementation of Data Source API V2, with comparison to the Data Source API V1. We also demonstrate how to implement a file-based data source using the Data Source API V2 for showing its generality and flexibility.
High-speed Database Throughput Using Apache Arrow Flight SQLScyllaDB
Flight SQL is a revolutionary new open database protocol designed for modern architectures. Key features in Flight SQL include a columnar-oriented design and native support for parallel processing of data partitions. This talk will go over how these new features can push SQL query throughput beyond existing standards such as ODBC.
Mark and Wes will talk about Cypher optimization techniques based on real queries as well as the theoretical underlying processes. They'll start from the basics of "what not to do", and how to take advantage of indexes, and continue to the subtle ways of ordering MATCH/WHERE/WITH clauses for optimal performance as of the 2.0.0 release.
In a world where compute is paramount, it is all too easy to overlook the importance of storage and IO in the performance and optimization of Spark jobs.
Security is one of fundamental features for enterprise adoption. Specifically, for SQL users, row/column-level access control is important. However, when a cluster is used as a data warehouse accessed by various user groups via different ways, it is difficult to guarantee data governance in a consistent way. In this talk, we focus on SQL users and talk about how to provide row/column-level access controls with common access control rules throughout the whole cluster with various SQL engines, e.g., Apache Spark 2.1, Apache Spark 1.6 and Apache Hive 2.1. If some of rules are changed, all engines are controlled consistently in near real-time. Technically, we enables Spark Thrift Server to work with an identify given by JDBC connection and take advantage of Hive LLAP daemon as a shared and secured processing engine. We demonstrate row-level filtering, column-level filtering and various column maskings in Apache Spark with Apache Ranger. We use Apache Ranger as a single point of security control center.
Intergen's Solution Architect and Microsoft MVP Gavin Barron presented "PowerShell: Automation for everyone" during the MVP CompCamp 2014, a worldwide event ran during the weekend of 22-23 March 2014.
Gavin's blog: http://gavinb.net/
The Parquet Format and Performance Optimization OpportunitiesDatabricks
The Parquet format is one of the most widely used columnar storage formats in the Spark ecosystem. Given that I/O is expensive and that the storage layer is the entry point for any query execution, understanding the intricacies of your storage format is important for optimizing your workloads.
As an introduction, we will provide context around the format, covering the basics of structured data formats and the underlying physical data storage model alternatives (row-wise, columnar and hybrid). Given this context, we will dive deeper into specifics of the Parquet format: representation on disk, physical data organization (row-groups, column-chunks and pages) and encoding schemes. Now equipped with sufficient background knowledge, we will discuss several performance optimization opportunities with respect to the format: dictionary encoding, page compression, predicate pushdown (min/max skipping), dictionary filtering and partitioning schemes. We will learn how to combat the evil that is ‘many small files’, and will discuss the open-source Delta Lake format in relation to this and Parquet in general.
This talk serves both as an approachable refresher on columnar storage as well as a guide on how to leverage the Parquet format for speeding up analytical workloads in Spark using tangible tips and tricks.
Data Quality With or Without Apache Spark and Its EcosystemDatabricks
Few solutions exist in the open-source community either in the form of libraries or complete stand-alone platforms, which can be used to assure a certain data quality, especially when continuous imports happen. Organisations may consider picking up one of the available options – Apache Griffin, Deequ, DDQ and Great Expectations. In this presentation we’ll compare these different open-source products across different dimensions, like maturity, documentation, extensibility, features like data profiling and anomaly detection.
Hyperspace is a recently open-sourced (https://github.com/microsoft/hyperspace) indexing sub-system from Microsoft. The key idea behind Hyperspace is simple: Users specify the indexes they want to build. Hyperspace builds these indexes using Apache Spark, and maintains metadata in its write-ahead log that is stored in the data lake. At runtime, Hyperspace automatically selects the best index to use for a given query without requiring users to rewrite their queries. Since Hyperspace was introduced, one of the most popular asks from the Spark community was indexing support for Delta Lake. In this talk, we present our experiences in designing and implementing Hyperspace support for Delta Lake and how it can be used for accelerating queries over Delta tables. We will cover the necessary foundations behind Delta Lake’s transaction log design and how Hyperspace enables indexing support that seamlessly works with the former’s time travel queries.
Using Spark to Load Oracle Data into CassandraJim Hatcher
This presentation describes how you can use Spark as an ETL tool to get data from a relational database into Cassandra. I go through the concept in general and then talk about some specific issues you might run into and how to fix them.
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersDataWorks Summit
Apache Knox Gateway is a proxy for interacting with Apache Hadoop clusters in a secure way providing authentication, service level authorization, and many other extensions to secure any HTTP interactions in your cluster. One main feature of Apache Knox Gateway is the ability to extend the reach of your REST APIs to the internet while still securing your cluster and working with Kerberos. Recent contributions to the Apache Knox community have added support for Single Sign On (SSO) based on Pac4j 1.8.9 which is a very powerful security engine which provides SSO support through SAML2, OAuth, OpenID, and CAS. In addition, through recent community contributions Apache Ambari, and Apache Ranger can now also provide SSO authentication through Knox. This paper will discuss the architecture of Knox SSO, it will explain how enterprise user could benefit by this feature and will present enterprise use cases for Knox SSO, and integration with open source Shibboleth, ADFS Windows server Idp support, and Okta cloud Idp.
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...StreamNative
Milvus is an open-source vector database that leverages a novel data fabric to build and manage vector similarity search applications. As the world's most popular vector database, it has already been adopted in production by thousands of companies around the world, including Lucidworks, Shutterstock, and Cloudinary. With the launch of Milvus 2.0, the community aims to introduce a cloud-native, highly scalable and extendable vector similarity solution, and the key design concept is log as data.
Milvus relies on Pulsar as the log pub/sub system. Pulsar helps Milvus to reduce system complexity by loosely decoupling each micro service, making the system stateless by disaggregating log storage and computation, which also makes the system further extendable. We will introduce the overview design, the implementation details of Milvus and its roadmap in this topic.
Takeaways:
1) Get a general idea about what is a vector database and its real-world use cases.
2) Understand the major design principles of Milvus 2.0.
3) Learn how to build a complex system with the help of a modern log system like Pulsar.
BIND’s New Security Feature: DNSRPZ - the "DNS Firewall"Barry Greene
Learn how to turn your network’s DNS into a Security Tool! Webinar-Oct 12th
What do you do if the security tools are not protecting your network? Cyber-criminals are constantly finding ways to bypass your security tools and own your network. When the threat changes, you should grow with the threat - think out of the box – using tools that the criminals have not yet considered; the DNS.
ISC’s Internet Critical Open Source DNS software BIND has a new feature that would turn a DNS Caching Resolver into a tool to help protect your network from malware. All the computers in your network must contact your DNS Resolvers to get to the outside world. Your DNS Resolvers are critical “choke-point” for which all devices in your network must interact to get to the outside world. This "choke-point" is a logical choice to put security capabilities to check if a domain is "clean" or "dirty."
How can you have your DNS Resolver check if a domain is clean or dirty? Use BIND’s new feature – the DNS Response Policy Zone (DNSRPZ). DNSRPZ uses secure and fast zone transfer technologies to pull down black list of bad domains and put them into your DNS resolver.
The archived recording of the Webinar is here: www.isc.org/webinars
Who should watch this Webinar?
E-mail Administrators: Find out how DNSRPZ offers more effective way to work with the Anti-Spam black list.
Network Operators: Learn how DNSRPZ can be used inside your network to keep your users from being in-inadvertently infected by malware, zero-days, and malvertisements.
Security Engineers: Discover how DNSRPZ is a tool to help contain infections that get into your network and try to “call home” to a BOTNET controller.
Hosting Providers: By default, most of your hosting customers are using your DNS resolvers. Learn how DNSRPZ can help prevent and contain the threat of your customers getting infected.
Service Providers: Learn how to turn your DNS services into a tool to help protect all your customers from infection.
Mobile Telecoms Operators: Find a new tool that would prevent miscreant smart phone applications from calling home with DNS and infecting your customer’s phones.
SCADA and Critical Industrial System Operators: Learn how DNSRPZ is a tool to help protect legacy control systems that need DNS to work.
Speaking from experience building MyGet.org: users are insane. If you are lucky, they use your service, but in reality, they probably abuse. Crazy usage patterns resulting in more requests than expected, request bursts when users come back to the office after the weekend, and more! These all pose a potential threat to the health of our web application and may impact other users or the service as a whole. Ideally, we can apply some filtering at the front door: limit the number of requests over a given timespan, limiting bandwidth, ...
In this talk, we’ll explore the simple yet complex realm of rate limiting. We’ll go over how to decide on which resources to limit, what the limits should be and where to enforce these limits – in our app, on the server, using a reverse proxy like Nginx or even an external service like CloudFlare or Azure API management. The takeaway? Know when and where to enforce rate limits so you can have both a happy application as well as happy customers.
Are you ready for the next attack? reviewing the sp security checklist (apnic...Barry Greene
Rethinking Security and how you can Act on Meaningful Change
What the industry recommends to protect your network is NOT working! The industry is stuck in a dysfunctional ecosystem that encourages the cyber-criminal innovation at the cost to business and individual loss throughout the world. We do not need a “Manhattan Project” for the security of the Internet. What we need are tools to help operators throughout the world ask the right question that would lead them to meaningful action. Security empowerment must empower the grassroots and provide the tools to push back on the root cause. This talk will explore these issues, highlight the dysfunction in our “security” economy, and present “take home” tools that would facilitate immediate action.
OpenDNS Enterprise Web Filtering allows organizations of all sizes to block websites at work. Choose from over 50 customizable categories. Use block page bypass to grant exceptions to your Web filtering policy. OpenDNS Enterprise offers web filtering without an appliance, can be deployed nearly instantly, and can be managed anywhere you have an Internet connection.
We browse the Internet. We host our applications on a server or a cloud that is hooked up with a nice domain name. That’s all there is to know about DNS, right? This talk is a refresher about how DNS works. How we can use it and how it can affect availability of our applications. How we can use it as a means of configuring our application components. How this old geezer protocol is a resilient, distributed system that is used by every Internet user in the world. How we can use it for things that it wasn’t built for. Come join me on this journey through the innards of the web!
Type safe embedded domain-specific languagesArthur Xavier
Language is everything; it governs our lives: from our thought processes, our communication abilities and our understanding of the world, all the way up to law, politics, logic and programming. All of these domains of human experience are governed by different languages that talk to each other, and so should be your code. Haskell provides all the means necessary—and many more—to easily and safely use embedded small languages that are tailored to specific needs and business domains.
In this series of lectures and workshops, we will explore the whats, whys and hows of embedded domain-specific languages in Haskell, and how language oriented programing can bring type-safety, composability and simplicity to the development of complex applications.
This talk was give for Elixir Montreal Meetup to talk about how the elixir formatter works and how it compiles the source code to generate a pretty format
Building a friendly .NET SDK to connect to SpaceMaarten Balliauw
Space is a team tool that integrates chats, meetings, git hosting, automation, and more. It has an HTTP API to integrate third party apps and workflows, but it's massive! And slightly opinionated.
In this session, we will see how we built the .NET SDK for Space, and how we make that massive API more digestible. We will see how we used code generation, and incrementally made the API feel more like a real .NET SDK.
In this presentation, You will get to know about Function Literal,Higher Order Function,Partial Function,Partial Applied Function,Nested Function,Closures.
This presentation emphasis on How to connect a Play Application with Mysql as database in Scala.Play includes a simple data access layer called Anorm that uses plain SQL to interact with the database and provides an API to parse and transform the resulting datasets.
Attributes Unwrapped: Lessons under the surface of active record.toster
Ведущий разработчик Ruby on Rails (Rails Core member) Джон Лейтон не так давно работал над совершенствованием реализации работы с атрибутами в Active Record. Он расскажет о своем опыте работы над важной для производительности областью Rails, даст советы и расскажет о техниках, которые могут быть применены к собственным приложениям слушателей.
Говоря о будущем, Джон также расскажет о своих идеях по изменению API работы с атрибутами в лучшую сторону; эти изменения могут появиться в Rails 4.0.
Perl6 regular expression ("regex") syntax has a number of improvements over the Perl5 syntax. The inclusion of grammars as first-class entities in the language makes many uses of regexes clearer, simpler, and more maintainable. This talk looks at a few improvements in the regex syntax and also at how grammars can help make regex use cleaner and simpler.
Similar to Creating Domain Specific Languages in Python (20)
In our development, we don't treat every feature the same. An emergency bug may be treated differently from a normal feature request. Whereas an innovative new feature might require multiple iterations of user feedback. So why do we develop our process to treat every feature the same?
NOTE: Yes, the slides mostly have only images. That is because the slides are from my talk from the meetup at Kanban Day, Chennai. I'm writing out the transcript for a LinkedIn article, and I'll post the link here once I'm done.
Do teams really need a backlog? Large backlogs are often wasteful -- they are difficult to groom and manage, difficult to prioritise and difficult to keep track of everything going on. By limiting WIP at the portfolio level, deferring commitment to the last responsible moment, and tracking lead times at an MMF level, we may be able to hack backlogs that are so small, that we can get rid of them altogether.
If you develop SaaS products, you know how challenging it is to convert free or trial users to paid customers. Luckily there are a few ways to improve that conversion rate. This presentation shows five ways to do that.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
5. Life Without Regular Expressions
def is_ip_address(ip_address):
components = ip_address_string.split(".")
if len(components) != 4: return False
try:
int_components = [int(component) for component in
components]
except ValueError:
return False
for component in int_components:
if component < 0 or component > 255:
return False
return True
6. Life With Regular Expressions
def is_ip(ip_address_string):
match = re.match(r"^(d{1,3}).(d{1,3}).(d{1,3}).
(d{1,3})$", ip_address_string)
if not match: return False
for component in match.groups():
if int(component) < 0 or int(component) > 255:
return False
return True
7. The DSL that simplifies our life
^(d{1,3}).(d{1,3}).(d{1,3}).(d{1,3})$
8. Why DSL - Answered
When working in a particular domain, write your
code in a syntax that fits the domain.
When working with patterns, use RegEx
When working with RDBMS, use SQL
When working in your domain – create your own DSL
9. The two types of DSLs
External DSL – The code is written in an external
file or as a string, which is read and parsed by the
application
10. The two types of DSLs
Internal DSL – Use features of the language (like
metaclasses) to enable people to write code in
python that resembles the domain syntax
11. Creating Forms – No DSL
<form>
<label>Name:</label><input type=”text” name=”name”/>
<label>Email:</label><input type=”text” name=”email”/>
<label>Password:</label><input type=”password”
name=”name”/>
</form>
12. Creating Forms – No DSL
– Requires HTML knowledge to maintain
– Therefore it is not possible for the end user to
change the structure of the form by themselves
13. Creating Forms – External DSL
UserForm
name->CharField label:Username
email->EmailField label:Email Address
password->PasswordField
This text file is parsed and rendered by the app
14. Creating Forms – External DSL
+ Easy to understand form structure
+ Can be easily edited by end users
– Requires you to read and parse the file
15. Creating Forms – Internal DSL
class UserForm(forms.Form):
username = forms.RegexField(regex=r'^w+$',
max_length=30)
email = forms.EmailField(maxlength=75)
password =
forms.CharField(widget=forms.PasswordInput())
Django uses metaclass magic to convert this
syntax to an easily manipulated python class
16. Creating Forms – Internal DSL
+ Easy to understand form structure
+ Easy to work with the form as it is regular python
+ No need to read and parse the file
– Cannot be used by non-programmers
– Can sometimes be complicated to implement
– Behind the scenes magic → debugging hell
17. Creating an External DSL
UserForm
name:CharField -> label:Username size:25
email:EmailField -> size:32
password:PasswordField
Lets write code to parse and render this form
18. Options for Parsing
Using string functions → You have to be crazy
Using regular expressions →
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems. - Jamie Zawinski
Writing a parser → ✓ (we will use PyParsing)
22. Step 3: Implement the Grammar
newline = "n"
colon = ":"
arrow = "->"
word = Word(alphas)
key = word
value = Word(alphanums)
field_type = oneOf("CharField EmailField PasswordField")
field_name = word
form_name = word
field_property = key + colon + value
field = field_name + colon + field_type +
Optional(arrow + OneOrMore(field_property)) + newline
form = form_name + newline + OneOrMore(field)
23. Quick Note
PyParsing itself implements a neat little internal
DSL for you to describe the parser grammer
Notice how the PyParsing code almost perfectly
reflects the BNF grammer
24. Output
> print form.parseString(input_form)
['UserForm', 'n', 'name', ':', 'CharField', '->',
'label', ':', 'Username', 'size', ':', '25', 'n',
'email', ':', 'EmailField', '->', 'size', ':', '25', 'n',
'password', ':', 'PasswordField', 'n']
PyParsing has neatly parsed our form input into
tokens. Thats nice, but we can do more.
26. Output
> print form.parseString(input_form)
['UserForm', 'name', 'CharField', 'label', 'Username',
'size', '25', 'email', 'EmailField', 'size', '25',
'password', 'PasswordField']
All the noise tokens are now removed from the
parsed output
28. Output
> print form.parseString(input_form)
['UserForm',
['name', 'CharField',
[['label', 'Username'], ['size', '25']]],
['email', 'EmailField',
[['size', '25']]],
['password', 'PasswordField',[]]]
Related tokens are now grouped together in a list
29. Step 6: Give Names to Tokens
form_name = word.setResultsName("form_name")
field = Group(field_name + colon + field_type +
Group(Optional(arrow + OneOrMore(field_property))) +
newline).setResultsName("form_field")
30. Output
> parsed_form = form.parseString(input_form)
> print parsed_form.form_name
UserForm
> print parsed_form.fields[1].field_type
EmailField
Now we can refer to parsed tokens by name
31. Step 7: Convert Properties to Dict
def convert_prop_to_dict(tokens):
prop_dict = {}
for token in tokens:
prop_dict[token.property_key] =
token.property_value
return prop_dict
field = Group(field_name + colon + field_type +
Optional(arrow + OneOrMore(field_property))
.setParseAction(convert_prop_to_dict) +
newline).setResultsName("form_field")
32. Output
> print form.parseString(input_form)
['UserForm',
['name', 'CharField',
{'size': '25', 'label': 'Username'}],
['email', 'EmailField',
{'size': '32'}],
['password', 'PasswordField', {}]
]
Sweet! The field properties are parsed into a dict
33. Step 7: Generate HTML Output
We need to walk through the parsed form and
generate a html string out of it
34. def get_field_html(field):
properties = field[2]
label = properties["label"] if "label" in properties else field.field_name
label_html = "<label>" + label + "</label>"
attributes = {"name":field.field_name}
attributes.update(properties)
if field.field_type == "CharField" or field.field_type == "EmailField":
attributes["type"] = "text"
else:
attributes["type"] = "password"
if "label" in attributes:
del attributes["label"]
attributes_html = " ".join([name+"='"+value+"'" for name,value in attributes.items()])
field_html = "<input " + attributes_html + "/>"
return label_html + field_html + "<br/>"
def render(form):
fields_html = "".join([get_field_html(field) for field in form.fields])
return "<form id='" + form.form_name.lower() +"'>" + fields_html + "</form>"
37. Wish we could do this...
> print Form(CharField(name=”user”,size=”25”,label=”ID”),
id=”myform”)
<form id='myform'>
<label>ID</label>
<input type='text' name='name' size='25'/><br/>
</form>
Neat, clean syntax that matches the output domain
well. But how do we create this kind of syntax?
43. class Form(HtmlElement):
tag = "form"
class CharField(Input):
default_attributes = {"type":"text"}
class EmailField(CharField):
pass
class PasswordField(Input):
default_attributes = {"type":"password"}
48. Summary
+ DSLs make your code easier to read
+ DSLs make your code easier to write
+ DSLs make it easy to for non-programmers to
maintain code
+ PyParsing makes is easy to write External DSLs
+ Python makes it easy to write Internal DSLs