Dask - Parallelism for Machine Learning with PythonMatheus Pereira
Brief presentation of Dask, a Python library that provides advanced parallelism for analytics, enabling performance at scale for the tools you love (Pandas, Numpy and Scikit-Learn)
You can follow the presentation to discovery more about Delayed, Futures and Distributed Work using Dask.
This presentation is an adaptation and simplification of oficial dask-tutorial
https://github.com/dask/dask-tutorial
Continuous modeling - automating model building on high-performance e-Infrast...Ola Spjuth
This document discusses continuous modeling and automating model building on high-performance infrastructures. It notes the new challenges of data management, analysis, scaling, and automation posed by high-throughput technologies. The author's research focuses on enabling high-throughput biology through large-scale predictive modeling, evaluating performance, and automating model re-building. Predictive toxicology and pharmacology are becoming data-intensive due to more data sources. The document explores modeling large datasets on high-performance computing infrastructures and whether workflow systems or cloud/Big Data frameworks could improve modeling.
Hooking up Semantic MediaWiki with external tools via SPARQLSamuel Lampa
This document discusses integrating Semantic MediaWiki (SMW) with external tools using the RDFIO extension. It describes the motivation for RDFIO as allowing manual schema exploration, automated data generation, and community collaboration. RDFIO solves problems with SMW by allowing the choice of wiki page titles for RDF entities and exporting RDF in the original import format. Real-world uses of RDFIO include visualizing data on SMW pages and pulling data from R into SMW using SPARQL queries. The integration of SMW and Bioclipse is also discussed.
2nd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in BioclipseSamuel Lampa
Contains a small background on the semantic web, and shows how Prolog is thought to be used from inside Bioclipse research software for RDF data handling.
Agile large-scale machine-learning pipelines in drug discoveryOla Spjuth
This document discusses challenges in scaling machine learning for drug discovery as data grows. The author describes their work developing automated workflows and pipelines for building predictive models on large datasets using techniques like Hadoop, Spark, and cloud computing. Their goal is to enable non-experts to build accurate models and make predictions in real-time as structures are modified. The document outlines several projects applying these techniques to problems like site-of-metabolism prediction, target prediction, and next-generation sequencing analysis. It evaluates challenges in scaling modeling to many datasets and targets on high performance computing clusters and private clouds.
SciPipe - A light-weight workflow library inspired by flow-based programmingSamuel Lampa
A presentation of the SciPipe workflow library, written in Go (Golang), inspired by Flow-based programming, at an internal workshop at Uppsala University, Department of Pharmaceutical Biosciences.
This document provides a cheat sheet overview of key concepts in the IRODS rule language, including numeric and string literals, arithmetic and comparison operators, functions for strings, lists, tuples, if/else statements, foreach loops, defining functions and rules, handling errors, and inductive data types. It describes syntax for defining data types using constructors, and using pattern matching to define functions over data types.
Dask - Parallelism for Machine Learning with PythonMatheus Pereira
Brief presentation of Dask, a Python library that provides advanced parallelism for analytics, enabling performance at scale for the tools you love (Pandas, Numpy and Scikit-Learn)
You can follow the presentation to discovery more about Delayed, Futures and Distributed Work using Dask.
This presentation is an adaptation and simplification of oficial dask-tutorial
https://github.com/dask/dask-tutorial
Continuous modeling - automating model building on high-performance e-Infrast...Ola Spjuth
This document discusses continuous modeling and automating model building on high-performance infrastructures. It notes the new challenges of data management, analysis, scaling, and automation posed by high-throughput technologies. The author's research focuses on enabling high-throughput biology through large-scale predictive modeling, evaluating performance, and automating model re-building. Predictive toxicology and pharmacology are becoming data-intensive due to more data sources. The document explores modeling large datasets on high-performance computing infrastructures and whether workflow systems or cloud/Big Data frameworks could improve modeling.
Hooking up Semantic MediaWiki with external tools via SPARQLSamuel Lampa
This document discusses integrating Semantic MediaWiki (SMW) with external tools using the RDFIO extension. It describes the motivation for RDFIO as allowing manual schema exploration, automated data generation, and community collaboration. RDFIO solves problems with SMW by allowing the choice of wiki page titles for RDF entities and exporting RDF in the original import format. Real-world uses of RDFIO include visualizing data on SMW pages and pulling data from R into SMW using SPARQL queries. The integration of SMW and Bioclipse is also discussed.
2nd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in BioclipseSamuel Lampa
Contains a small background on the semantic web, and shows how Prolog is thought to be used from inside Bioclipse research software for RDF data handling.
Agile large-scale machine-learning pipelines in drug discoveryOla Spjuth
This document discusses challenges in scaling machine learning for drug discovery as data grows. The author describes their work developing automated workflows and pipelines for building predictive models on large datasets using techniques like Hadoop, Spark, and cloud computing. Their goal is to enable non-experts to build accurate models and make predictions in real-time as structures are modified. The document outlines several projects applying these techniques to problems like site-of-metabolism prediction, target prediction, and next-generation sequencing analysis. It evaluates challenges in scaling modeling to many datasets and targets on high performance computing clusters and private clouds.
SciPipe - A light-weight workflow library inspired by flow-based programmingSamuel Lampa
A presentation of the SciPipe workflow library, written in Go (Golang), inspired by Flow-based programming, at an internal workshop at Uppsala University, Department of Pharmaceutical Biosciences.
This document provides a cheat sheet overview of key concepts in the IRODS rule language, including numeric and string literals, arithmetic and comparison operators, functions for strings, lists, tuples, if/else statements, foreach loops, defining functions and rules, handling errors, and inductive data types. It describes syntax for defining data types using constructors, and using pattern matching to define functions over data types.
Nancy CLI, a unified way to manage automated database experiments. Nancy CLI is an automated database management framework based on well-known open-source projects and incorporating major open-source tools.
Using these tools, casual DBAs can conduct automated experiments today, either on AWS EC2 Spot instances or on any other servers. All you need is to tell Nancy which database to use, how to determine workloads and what you want to verify – say, check how some index will help, or compare various values of "default_statistics_target" for your database and your workload.
Everything else Nancy will do for you, in fully automated fashion, in the end presenting you detailed results for comparison.
The document discusses various techniques used to optimize the runtime of a Java application called TopicViewer from over 2 minutes to 18 seconds. Some of the key optimizations included replacing the Colt library with the faster ParallelColt library, avoiding unnecessary object creation, using object arrays instead of maps for better performance, parallelizing loops, using non-blocking data structures for heavy I/O, and profiling the application to identify additional opportunities for improvement. The optimizations reduced the runtime by over 8 times from the original 2 minutes 10 seconds.
The document discusses analyzing database systems using a 3D method for performance analysis. It introduces the 3D method, which looks at performance from the perspectives of the operating system (OS), Oracle database, and applications. The 3D method provides a holistic view of the system that can help identify issues and direct solutions. It also covers topics like time-based analysis in Oracle, how wait events are classified, and having a diagnostic framework for quick troubleshooting using tools like the Automatic Workload Repository report.
Martin Goodson describes his experience with Spark over three phases. In Phase I, he worked with various data processing tools like R, Python, Pig and Spark. In Phase II, he focused on Pig and Python UDFs. In Phase III, he plans to explore PySpark. He also discusses Skimlinks' data volume of 30TB per month, their data science team, and some realities of working with Spark including configuration challenges and common errors.
Operationalizing Clojure in mature enterprises can be difficult. I'm presenting a case study from my experience deploying and maintaining a clojure application for delivering ad-free videos to ISS for NASA. The goal is to tease out the core principles that makes an application "operational".
The document provides advice for feeling overwhelmed as a newcomer to an OCaml project, recommending using the OCamlSpotter tool to more easily search and navigate code by leveraging metadata from the compiler rather than manual searching or grep. It describes how OCamlSpotter works and can be integrated with editors, and argues that it is a proven solution for locating symbols in OCaml code.
Fine-tuning your development environment means more than just getting your editor set up just so -- it means finding and setting up a variety of tools to take care of the mundane housekeeping chores that you have to do -- so you have more time to program, of course! I'll share the benefits of a number of yak shaving expeditions, including using App::GitGot to batch manage _all_ your git repos, App::MiseEnPlace to automate getting things _just_ so in your working environment, and a few others as time allows.
Delivered at OpenWest 2016, 13 July 2016
Systematic Generation Data and Types in C++Sumant Tambe
This presentation will discuss two classic techniques from the functional domain — composable data generators and property-based testing — implemented in C++14 for testing a generic serialization and deserialization library. We will look at a systematic technique of constructing data generators from a mere random number generator and random type generation using compile-time meta-programming. Along the way, we will discuss monoids, functors, and monads as we encounter them.
Android Performance Optimization presentation which is told on @ DevFest Ankara 2013, @ DevFest Eskişehir 2013, @ DevFest Konya 2014, @DevFest İstanbul 2014
Testing and validating spark programs - Strata SJ 2016Holden Karau
Apache Spark is a fast, general engine for big data processing. As Spark jobs are used for more mission-critical tasks, it is important to have effective tools for testing and validation. Expanding her Strata NYC talk, “Effective Testing of Spark Programs,” Holden Karau details reasonable validation rules for production jobs and best practices for creating effective tests, as well as options for generating test data.
Holden explores best practices for generating complex test data, setting up performance testing, as well as basic unit testing. The validation component will focus on how to create reasonable validation rules given the constraints of Spark’s accumulators.
Unit testing of Spark programs is deceptively simple. Holden looks at how unit testing of Spark itself is accomplished and distills a number of best practices into traits we can use. This includes dealing with local mode cluster creation and tear down during test suites, factoring our functions to increase testability, mock data for RDDs, and mock data for Spark SQL. A number of interesting problems also arise when testing Spark Streaming programs, including handling of starting and stopping the streaming context, providing mock data, and collecting results, and Holden pulls out simple takeaways for dealing with these issues.
Holden also explores Spark’s internal methods for generating random data, as well as options using external libraries to generate effective test datasets (for both small- and large-scale testing). And while acceptance tests are not always thought of as part of testing, they share a number of similarities, so Holden discusses which counters Spark programs generate that we can use for creating acceptance tests, best practices for storing historic values, and some common counters we can easily use to track the success of our job, all while working within the constraints of Spark’s accumulators.
streamparse and pystorm: simple reliable parallel processing with stormDaniel Blanchard
Storm is a distributed real-time computation system that dramatically simplifies processing streaming data. streamparse allows Python code to integrate with Storm by providing a Pythonic API. It handles running, debugging, and deploying Storm topologies to clusters through commands like "sparse run" and "sparse submit".
Euro python2011 High Performance PythonIan Ozsvald
I ran this as a 4 hour tutorial at EuroPython 2011 to teach High Performance Python coding.
Techniques covered include bottleneck analysis by profiling, bytecode analysis, converting to C using Cython and ShedSkin, use of the numerical numpy library and numexpr, multi-core and multi-machine parallelisation and using CUDA GPUs.
Write-up with 49 page PDF report: http://ianozsvald.com/2011/06/29/high-performance-python-tutorial-v0-1-from-my-4-hour-tutorial-at-europython-2011/
Beyond the Callback: Yield Control with Javascript GeneratorsDarren Cruse
Generators allow for control flow that yields values incrementally instead of returning all at once. They were introduced in ES6 and provide an alternative to callbacks for asynchronous code. Many task runners and libraries have been created using generators to simplify asynchronous patterns. More recently, the async/await syntax was proposed to build on generators with a cleaner syntax resembling synchronous code. Overall, generators and async/await make asynchronous code easier to read and maintain through more natural control flow.
This document discusses ways to achieve cheap high-performance computing (HPC). It describes how graphics processing units (GPUs) can be used for general purpose computing through APIs like CUDA and OpenCL. GPUs have many parallel compute units that can speed up massively parallel problems. The document also discusses using cloud computing resources and Windows HPC Server to harness unused computing power for HPC tasks in a cost effective manner.
Hacking iOS Simulator: writing your own plugins for SimulatorAhmed Sulaiman
What's simctl command-line tool, how to achieve great user experience with method swizzling and build dynamic libraries for iOS as a plugin.
Where could you apply knowledge of writing own iOS Simulator plugins and how this will make you a better developer.
Delivered on Cocoaheads Kyiv #15.
This document discusses using Python for scientific computing. It begins by listing popular programming languages for scientific purposes, including Fortran, MATLAB, Scilab, GNU Octave, Mathematica, and Python. While MATLAB is currently the most popular, it is proprietary software. Python is introduced as a free and open source alternative with many scientific libraries like NumPy, SciPy, scikit-learn, and Matplotlib. These libraries allow Python to perform similarly to MATLAB. Instructions are provided for installing the necessary Python packages on Linux, Unix, and Windows systems. Examples demonstrate basic Python syntax and how to perform tasks like importing data, visualization, and machine learning classification.
CoreData - there is an ORM you can like!Tomáš Jukin
The document discusses Core Data, an object graph and persistence framework for managing and persisting data in iOS and macOS applications. It provides an overview of why and when to use Core Data, noting that it handles common tasks like data schema changes, syncing with servers and peers, undo/redo, threading, and notifying code of changes. It also discusses alternatives like MagicalRecord that simplify Core Data usage. The document recommends using Core Data for any non-trivial application where requirements may change over time, and provides code examples for common Core Data tasks like fetching, saving context, and queries.
2012 07 making disqus realtime@euro pythonAdam Hitchcock
1) DISQUS implemented a real-time component to increase user engagement by getting new data to users quickly.
2) The architecture uses Redis for pub/sub messaging between frontend Flask servers using Gunicorn and gevent, and backend Python services that format and publish messages.
3) Testing was important to ensure the real-time system could handle DISQUS's large traffic, and metrics were used to measure and optimize performance.
Using Flow-based programming to write tools and workflows for Scientific Comp...Samuel Lampa
The document summarizes Samuel Lampa's talk on using flow-based programming for scientific computing. It provides biographical information on Samuel Lampa, including his background in pharmaceutical bioinformatics and current work. It then gives an overview of flow-based programming, describing it as using black box processes connected by data flows, with connections specified separately from processes. Benefits mentioned include easy testing, monitoring, and changing connections without rewriting components. Examples of using FBP in Go are also presented.
Linked Data for improved organization of research dataSamuel Lampa
Slides for a talk at a Farmbio BioScience Seminar May 18, 2018, at http://farmbio.uu.se introducing Linked Data as a way to manage research data in a way that can better keep track of provenance, make its semantics more explicit, and make it more easily integrated with other data, and consumed by others, both humans and machines.
More Related Content
Similar to Python Generators - Talk at PySthlm meetup #15
Nancy CLI, a unified way to manage automated database experiments. Nancy CLI is an automated database management framework based on well-known open-source projects and incorporating major open-source tools.
Using these tools, casual DBAs can conduct automated experiments today, either on AWS EC2 Spot instances or on any other servers. All you need is to tell Nancy which database to use, how to determine workloads and what you want to verify – say, check how some index will help, or compare various values of "default_statistics_target" for your database and your workload.
Everything else Nancy will do for you, in fully automated fashion, in the end presenting you detailed results for comparison.
The document discusses various techniques used to optimize the runtime of a Java application called TopicViewer from over 2 minutes to 18 seconds. Some of the key optimizations included replacing the Colt library with the faster ParallelColt library, avoiding unnecessary object creation, using object arrays instead of maps for better performance, parallelizing loops, using non-blocking data structures for heavy I/O, and profiling the application to identify additional opportunities for improvement. The optimizations reduced the runtime by over 8 times from the original 2 minutes 10 seconds.
The document discusses analyzing database systems using a 3D method for performance analysis. It introduces the 3D method, which looks at performance from the perspectives of the operating system (OS), Oracle database, and applications. The 3D method provides a holistic view of the system that can help identify issues and direct solutions. It also covers topics like time-based analysis in Oracle, how wait events are classified, and having a diagnostic framework for quick troubleshooting using tools like the Automatic Workload Repository report.
Martin Goodson describes his experience with Spark over three phases. In Phase I, he worked with various data processing tools like R, Python, Pig and Spark. In Phase II, he focused on Pig and Python UDFs. In Phase III, he plans to explore PySpark. He also discusses Skimlinks' data volume of 30TB per month, their data science team, and some realities of working with Spark including configuration challenges and common errors.
Operationalizing Clojure in mature enterprises can be difficult. I'm presenting a case study from my experience deploying and maintaining a clojure application for delivering ad-free videos to ISS for NASA. The goal is to tease out the core principles that makes an application "operational".
The document provides advice for feeling overwhelmed as a newcomer to an OCaml project, recommending using the OCamlSpotter tool to more easily search and navigate code by leveraging metadata from the compiler rather than manual searching or grep. It describes how OCamlSpotter works and can be integrated with editors, and argues that it is a proven solution for locating symbols in OCaml code.
Fine-tuning your development environment means more than just getting your editor set up just so -- it means finding and setting up a variety of tools to take care of the mundane housekeeping chores that you have to do -- so you have more time to program, of course! I'll share the benefits of a number of yak shaving expeditions, including using App::GitGot to batch manage _all_ your git repos, App::MiseEnPlace to automate getting things _just_ so in your working environment, and a few others as time allows.
Delivered at OpenWest 2016, 13 July 2016
Systematic Generation Data and Types in C++Sumant Tambe
This presentation will discuss two classic techniques from the functional domain — composable data generators and property-based testing — implemented in C++14 for testing a generic serialization and deserialization library. We will look at a systematic technique of constructing data generators from a mere random number generator and random type generation using compile-time meta-programming. Along the way, we will discuss monoids, functors, and monads as we encounter them.
Android Performance Optimization presentation which is told on @ DevFest Ankara 2013, @ DevFest Eskişehir 2013, @ DevFest Konya 2014, @DevFest İstanbul 2014
Testing and validating spark programs - Strata SJ 2016Holden Karau
Apache Spark is a fast, general engine for big data processing. As Spark jobs are used for more mission-critical tasks, it is important to have effective tools for testing and validation. Expanding her Strata NYC talk, “Effective Testing of Spark Programs,” Holden Karau details reasonable validation rules for production jobs and best practices for creating effective tests, as well as options for generating test data.
Holden explores best practices for generating complex test data, setting up performance testing, as well as basic unit testing. The validation component will focus on how to create reasonable validation rules given the constraints of Spark’s accumulators.
Unit testing of Spark programs is deceptively simple. Holden looks at how unit testing of Spark itself is accomplished and distills a number of best practices into traits we can use. This includes dealing with local mode cluster creation and tear down during test suites, factoring our functions to increase testability, mock data for RDDs, and mock data for Spark SQL. A number of interesting problems also arise when testing Spark Streaming programs, including handling of starting and stopping the streaming context, providing mock data, and collecting results, and Holden pulls out simple takeaways for dealing with these issues.
Holden also explores Spark’s internal methods for generating random data, as well as options using external libraries to generate effective test datasets (for both small- and large-scale testing). And while acceptance tests are not always thought of as part of testing, they share a number of similarities, so Holden discusses which counters Spark programs generate that we can use for creating acceptance tests, best practices for storing historic values, and some common counters we can easily use to track the success of our job, all while working within the constraints of Spark’s accumulators.
streamparse and pystorm: simple reliable parallel processing with stormDaniel Blanchard
Storm is a distributed real-time computation system that dramatically simplifies processing streaming data. streamparse allows Python code to integrate with Storm by providing a Pythonic API. It handles running, debugging, and deploying Storm topologies to clusters through commands like "sparse run" and "sparse submit".
Euro python2011 High Performance PythonIan Ozsvald
I ran this as a 4 hour tutorial at EuroPython 2011 to teach High Performance Python coding.
Techniques covered include bottleneck analysis by profiling, bytecode analysis, converting to C using Cython and ShedSkin, use of the numerical numpy library and numexpr, multi-core and multi-machine parallelisation and using CUDA GPUs.
Write-up with 49 page PDF report: http://ianozsvald.com/2011/06/29/high-performance-python-tutorial-v0-1-from-my-4-hour-tutorial-at-europython-2011/
Beyond the Callback: Yield Control with Javascript GeneratorsDarren Cruse
Generators allow for control flow that yields values incrementally instead of returning all at once. They were introduced in ES6 and provide an alternative to callbacks for asynchronous code. Many task runners and libraries have been created using generators to simplify asynchronous patterns. More recently, the async/await syntax was proposed to build on generators with a cleaner syntax resembling synchronous code. Overall, generators and async/await make asynchronous code easier to read and maintain through more natural control flow.
This document discusses ways to achieve cheap high-performance computing (HPC). It describes how graphics processing units (GPUs) can be used for general purpose computing through APIs like CUDA and OpenCL. GPUs have many parallel compute units that can speed up massively parallel problems. The document also discusses using cloud computing resources and Windows HPC Server to harness unused computing power for HPC tasks in a cost effective manner.
Hacking iOS Simulator: writing your own plugins for SimulatorAhmed Sulaiman
What's simctl command-line tool, how to achieve great user experience with method swizzling and build dynamic libraries for iOS as a plugin.
Where could you apply knowledge of writing own iOS Simulator plugins and how this will make you a better developer.
Delivered on Cocoaheads Kyiv #15.
This document discusses using Python for scientific computing. It begins by listing popular programming languages for scientific purposes, including Fortran, MATLAB, Scilab, GNU Octave, Mathematica, and Python. While MATLAB is currently the most popular, it is proprietary software. Python is introduced as a free and open source alternative with many scientific libraries like NumPy, SciPy, scikit-learn, and Matplotlib. These libraries allow Python to perform similarly to MATLAB. Instructions are provided for installing the necessary Python packages on Linux, Unix, and Windows systems. Examples demonstrate basic Python syntax and how to perform tasks like importing data, visualization, and machine learning classification.
CoreData - there is an ORM you can like!Tomáš Jukin
The document discusses Core Data, an object graph and persistence framework for managing and persisting data in iOS and macOS applications. It provides an overview of why and when to use Core Data, noting that it handles common tasks like data schema changes, syncing with servers and peers, undo/redo, threading, and notifying code of changes. It also discusses alternatives like MagicalRecord that simplify Core Data usage. The document recommends using Core Data for any non-trivial application where requirements may change over time, and provides code examples for common Core Data tasks like fetching, saving context, and queries.
2012 07 making disqus realtime@euro pythonAdam Hitchcock
1) DISQUS implemented a real-time component to increase user engagement by getting new data to users quickly.
2) The architecture uses Redis for pub/sub messaging between frontend Flask servers using Gunicorn and gevent, and backend Python services that format and publish messages.
3) Testing was important to ensure the real-time system could handle DISQUS's large traffic, and metrics were used to measure and optimize performance.
Using Flow-based programming to write tools and workflows for Scientific Comp...Samuel Lampa
The document summarizes Samuel Lampa's talk on using flow-based programming for scientific computing. It provides biographical information on Samuel Lampa, including his background in pharmaceutical bioinformatics and current work. It then gives an overview of flow-based programming, describing it as using black box processes connected by data flows, with connections specified separately from processes. Benefits mentioned include easy testing, monitoring, and changing connections without rewriting components. Examples of using FBP in Go are also presented.
Linked Data for improved organization of research dataSamuel Lampa
Slides for a talk at a Farmbio BioScience Seminar May 18, 2018, at http://farmbio.uu.se introducing Linked Data as a way to manage research data in a way that can better keep track of provenance, make its semantics more explicit, and make it more easily integrated with other data, and consumed by others, both humans and machines.
How to document computational research projectsSamuel Lampa
These slides are from an internal meeting at pharmb.io where we discussed ways to improve documentation of our internal computational research projects. The winning solutions turns out to be markdown files, versioned with git. The slides explains a little bit about why.
Reproducibility in Scientific Data Analysis - BioScience SeminarSamuel Lampa
Slides for a talk held at BioScience Seminar at Dept. of Pharmaceutical BioSciences at Uppsala University on December 16, 2016.
The event webpage: http://www.farmbio.uu.se/calendar/kalendarium-detaljsida/?eventId=22496
Structure of the talk:
Reproducibility in Scientific Data Analysis ...
● What is it?
● Why is it important?
● Why is it a problem?
● What can we do about it?
● What does pharmb.io do about it?
Vagrant, Ansible and Docker - How they fit together for productive flexible d...Samuel Lampa
A very quick overview of how Vagrant, Ansible and Docker fits nicely together as a very productive and flexible solution for creating automated development environments.
AddisDev Meetup ii: Golang and Flow-based ProgrammingSamuel Lampa
The document discusses flow-based programming (FBP), its history and concepts. FBP defines applications as networks of processes that exchange data through message passing over predefined connections. This allows the processes to be reconnected without changing their code. The document provides examples of FBP networks and components implemented in various languages like Go, Java and JavaScript. It also discusses the benefits of FBP and its growing popularity with implementations like NoFlo.
First encounter with Elixir - Some random thingsSamuel Lampa
The document discusses Samuel Lampa's first encounter with the programming language Elixir. It covers calculating GC ratios in DNA sequences, provides a DNA sequence example file, and compares Elixir processes to Go channels, noting that Elixir processes are named mailboxes tied to a process while Go channels are anonymous and separate from routines. The document is authored by Samuel Lampa from Uppsala University.
Profiling go code a beginners tutorialSamuel Lampa
This document summarizes a presentation on profiling Go code. It introduces pprof, a tool created by Dave Cheney that makes profiling Go code easier. It demonstrates pprof by profiling a string processing program and showing the performance improvements from various optimizations. It recommends resources for learning more about profiling Go programs with pprof and high performance Go programming.
This document provides an overview of flow-based programming (FBP). FBP is a programming paradigm where applications are defined as networks of black box processes that exchange data through predefined connections. These connections can be redefined without changing the internal processes, allowing for endless reconfiguration. FBP was invented in the 1960s and has seen a resurgence of interest with tools like NoFlo that allow building distributed applications as connected processes. The document discusses several open source FBP implementations and frameworks and provides examples of how FBP has been used to build applications and bioinformatics libraries.
RDFIO is an RDF import and query extension for MediaWiki. It allows users to import RDF triples into MediaWiki and query the triples using SPARQL. The architecture includes an in-memory RDF store to hold the triples and a SPARQL endpoint for querying. Future plans include enhancing editing capabilities via templates and importing triples on a per-page basis. Samuel Lampa presented RDFIO and is looking for additional ideas to improve the extension.
My lightning talk at Go Stockholm meetup Aug 6th 2013Samuel Lampa
This document discusses flow-based programming, an approach to programming invented in the 1970s where the flow of data between components is emphasized. It was successfully used in several domains including data analysis, banking software, and digital signal processing. New implementations of flow-based programming include NoFlo for Node.js and GoFlow, an open-source implementation in Go. More information on flow-based programming can be found on the listed websites.
Samuel Lampa presented his MSc thesis on integrating SWI-Prolog as a semantic querying tool in Bioclipse. He demonstrated [1] how SWI-Prolog can be used for semantic querying of biological data in RDF format within Bioclipse, [2] examples of SPARQL and Prolog code used to perform semantic queries, and [3] benchmarking of Prolog's performance as a semantic querying tool. The work adds new semantic querying functionality to Bioclipse using SWI-Prolog and demonstrates its ability to efficiently query biological data.
3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in BioclipseSamuel Lampa
This document summarizes Samuel Lampa's 2010 degree project on integrating SWI-Prolog for semantic reasoning in Bioclipse. It compares SWI-Prolog to other semantic tools like Jena and Pellet in terms of speed and expressiveness when querying biochemical data. Prolog code is presented for querying NMR spectrum data that finds molecules with peak values near a search value. SPARQL queries for the same use case are also shown. Observations indicate Prolog is fastest while SPARQL is easier to understand but Prolog allows easier parameter changes and logic reuse. A final presentation was planned for April 28, 2010.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
5. Using Generator function
A yield (“return”) for
every iteration, so it can
be used immediately
without temp data structure
== GOOD!
6. Using Generator function
A yield (“return”) for
every iteration, so it can
be used immediately
without temp data structure
== GOOD!
So, this is the
generator function
7. Using Generator function
A yield (“return”) for
every iteration, so it can
be used immediately
without temp data structure
== GOOD!
So, this is the
generator function
… which can be
iterated over, like this ...