Slides for a 20 minute presentation about Julia, with a brief introduction to multiple dispatch/multimethods and how it is used for numerical linear algebra
Julia: Multimethods for abstraction and performanceJiahao Chen
This document discusses Julia's use of multimethod dispatch and generic functions to provide both data abstraction and high performance. Julia bridges the divide between computer science and computational science by allowing users to write generic code that can also achieve native machine performance when applicable. Its type system and multimethods allow methods to be specialized for different argument types, enabling both abstraction and optimized implementations.
This document introduces Julia, an open-source programming language for technical computing with features like high performance, multiple dispatch, and ease of use. It has over 500 packages and 700 contributors. Julia bridges the gap between computer science and computational science by allowing for both high-level programming and high performance via just-in-time compilation. Its type system helps express scientific computations and improves performance. Other features include object-oriented programming with multiple dispatch, native parallelism, and domain-specific languages like JuMP for optimization. Julia has comparable or better performance than other numerical computing languages.
Julia? why a new language, an an application to genomics data analysisJiahao Chen
The document introduces Julia, a new programming language designed to be understandable by both programmers and compilers, allowing for high performance without sacrificing usability or flexibility. It discusses Julia's just-in-time compilation approach and how its design aims to overcome limitations of other languages like MATLAB, R, and Python. The document also provides an overview of Julia's community and ecosystem including package authors and contributors.
A Julia package for iterative SVDs with applications to genomics data analysisJiahao Chen
This document discusses a Julia package for iterative singular value decompositions (SVDs) with applications to genomics data analysis. It introduces SVD and its use in genome-wide association studies to identify genetic factors associated with diseases and traits. It summarizes different iterative SVD methods like Lanczos iteration and challenges like loss of orthogonality. The document presents a new Julia package called FlashPCA that uses a blocked power method to quickly approximate the top SVD components, and compares its performance to other iterative SVD solvers on large genomic datasets.
This document summarizes work done by the Julia Labs at MIT on genomics data analysis and optimization of principal component analysis algorithms for genome-wide association studies. It describes how a native Julia implementation of PCA was able to reduce computation time for finding the top 10 principal components of a 80,000x40,000 genotype matrix from over 2,900 seconds to just 81 seconds. It also discusses how custom matrix-vector multiplication functions allowed the same computation speed while using 32x less memory by reading directly from a compressed data format. Future work directions include more complex analytics, improved data imputation methods, out-of-core matrix operations, and accessing different data formats.
Julia, genomics data and their principal componentsJiahao Chen
This document summarizes the average ratings that different users have given to an Uber driver based on the user IDs and ratings provided. It shows solutions to calculating the average ratings in R, MATLAB, APL, C, and Julia. The solutions demonstrate different programming paradigms and language features like data abstraction, higher-order functions, vectorization, multiple dispatch, and dynamic typing in each language.
Meta-learning, or learning how to learn, is our innate ability to learn new, ever more complex tasks very efficiently by building on prior experience. It is a very exciting direction for machine learning (and AI in general). In this tutorial, I introduce the main concepts and state of the art.
The document discusses different approaches to meta-learning, or learning to learn. It begins by explaining how humans are able to learn new tasks more quickly by leveraging prior knowledge from similar tasks. It then covers three main approaches to meta-learning for machine learning models: 1) Starting with what generally works based on previous task performance data, 2) Starting from what is most likely to work for similar tasks based on task meta-features, and 3) Starting from previously trained models on very similar tasks via transfer learning. The document dives into various techniques within each of these three approaches, such as warm-starting optimization searches, learning task embeddings, and few-shot learning.
Julia: Multimethods for abstraction and performanceJiahao Chen
This document discusses Julia's use of multimethod dispatch and generic functions to provide both data abstraction and high performance. Julia bridges the divide between computer science and computational science by allowing users to write generic code that can also achieve native machine performance when applicable. Its type system and multimethods allow methods to be specialized for different argument types, enabling both abstraction and optimized implementations.
This document introduces Julia, an open-source programming language for technical computing with features like high performance, multiple dispatch, and ease of use. It has over 500 packages and 700 contributors. Julia bridges the gap between computer science and computational science by allowing for both high-level programming and high performance via just-in-time compilation. Its type system helps express scientific computations and improves performance. Other features include object-oriented programming with multiple dispatch, native parallelism, and domain-specific languages like JuMP for optimization. Julia has comparable or better performance than other numerical computing languages.
Julia? why a new language, an an application to genomics data analysisJiahao Chen
The document introduces Julia, a new programming language designed to be understandable by both programmers and compilers, allowing for high performance without sacrificing usability or flexibility. It discusses Julia's just-in-time compilation approach and how its design aims to overcome limitations of other languages like MATLAB, R, and Python. The document also provides an overview of Julia's community and ecosystem including package authors and contributors.
A Julia package for iterative SVDs with applications to genomics data analysisJiahao Chen
This document discusses a Julia package for iterative singular value decompositions (SVDs) with applications to genomics data analysis. It introduces SVD and its use in genome-wide association studies to identify genetic factors associated with diseases and traits. It summarizes different iterative SVD methods like Lanczos iteration and challenges like loss of orthogonality. The document presents a new Julia package called FlashPCA that uses a blocked power method to quickly approximate the top SVD components, and compares its performance to other iterative SVD solvers on large genomic datasets.
This document summarizes work done by the Julia Labs at MIT on genomics data analysis and optimization of principal component analysis algorithms for genome-wide association studies. It describes how a native Julia implementation of PCA was able to reduce computation time for finding the top 10 principal components of a 80,000x40,000 genotype matrix from over 2,900 seconds to just 81 seconds. It also discusses how custom matrix-vector multiplication functions allowed the same computation speed while using 32x less memory by reading directly from a compressed data format. Future work directions include more complex analytics, improved data imputation methods, out-of-core matrix operations, and accessing different data formats.
Julia, genomics data and their principal componentsJiahao Chen
This document summarizes the average ratings that different users have given to an Uber driver based on the user IDs and ratings provided. It shows solutions to calculating the average ratings in R, MATLAB, APL, C, and Julia. The solutions demonstrate different programming paradigms and language features like data abstraction, higher-order functions, vectorization, multiple dispatch, and dynamic typing in each language.
Meta-learning, or learning how to learn, is our innate ability to learn new, ever more complex tasks very efficiently by building on prior experience. It is a very exciting direction for machine learning (and AI in general). In this tutorial, I introduce the main concepts and state of the art.
The document discusses different approaches to meta-learning, or learning to learn. It begins by explaining how humans are able to learn new tasks more quickly by leveraging prior knowledge from similar tasks. It then covers three main approaches to meta-learning for machine learning models: 1) Starting with what generally works based on previous task performance data, 2) Starting from what is most likely to work for similar tasks based on task meta-features, and 3) Starting from previously trained models on very similar tasks via transfer learning. The document dives into various techniques within each of these three approaches, such as warm-starting optimization searches, learning task embeddings, and few-shot learning.
Knowledge Graphs have proven to be extremely valuable to rec-
ommender systems, as they enable hybrid graph-based recommen-
dation models encompassing both collaborative and content infor-
mation. Leveraging this wealth of heterogeneous information for
top-N item recommendation is a challenging task, as it requires the
ability of effectively encoding a diversity of semantic relations and
connectivity patterns. In this work, we propose entity2rec, a novel
approach to learning user-item relatedness from knowledge graphs
for top-N item recommendation. We start from a knowledge graph
modeling user-item and item-item relations and we learn property-
specific vector representations of users and items applying neural
language models on the network. These representations are used
to create property-specific user-item relatedness features, which
are in turn fed into learning to rank algorithms to learn a global
relatedness model that optimizes top-N item recommendations. We
evaluate the proposed approach in terms of ranking quality on
the MovieLens 1M dataset, outperforming a number of state-of-
the-art recommender systems, and we assess the importance of
property-specific relatedness scores on the overall ranking quality.
This document discusses last week's coding exercise on data harvesting and storage. It provides step-by-step explanations of code used to extract and analyze data from a JSON file. Examples include printing video titles, calculating average tags per video, determining the most commented on porn category, and finding the most frequently used words. The document also covers APIs, scrapers, file formats like JSON and CSV, and how to store extracted data.
Transfer Learning -- The Next Frontier for Machine LearningSebastian Ruder
Sebastian Ruder gave a presentation on transfer learning in machine learning. He began by defining transfer learning as applying knowledge gained from solving one problem to a different but related problem. Transfer learning is now important because machine learning models have matured and are being widely deployed, but often lack labeled data for new tasks or domains. Ruder discussed examples of transfer learning in computer vision and natural language processing. He described his research focus on finding better ways to transfer knowledge between domains, tasks, and languages in large-scale, real-world applications.
This document discusses a lecture on data harvesting and storage. It covers APIs, RSS feeds, scraping and crawling as methods for collecting data from various sources. It also discusses storing data in formats like CSV, JSON, and XML. The document provides code examples for working with JSON data and discusses tools for long-term data collection like DMI-TCAT.
Robot Localisation: An Introduction - Luis Contreras 2020.06.09 | RoboCup@Hom...robocupathomeedu
The document summarizes Luis Contreras' upcoming lecture on robot localization using particle filters. The key points covered are:
1. Robot localization is the process of determining a robot's pose (position and orientation) over time using motion and sensor measurements within a map.
2. Particle filters represent the robot's uncertain pose as a set of weighted particles, with each particle being a hypothesis of the robot's state.
3. As the robot moves and senses its environment, the particles are propagated and weighted according to the motion and sensor models to estimate the posterior probability distribution over poses.
Open & reproducible research - What can we do in practice?Felix Z. Hoffmann
- There is a reproducibility crisis in computational research even when code is made available. Out of 206 computational studies in Science magazine since a policy change mandating sharing, only 26 directly provided their code and data. Of those judged potentially reproducible when code was available, more than half still required significant effort to reproduce.
- Making research fully reproducible requires addressing issues like difficult computational environments, long run times, dependency on previous results, and clarity on what is required to reproduce a single finding. Following principles like ensuring code is re-runnable, repeatable, reproducible, reusable, and replicable can help achieve reproducibility. Publishing code on platforms like Zenodo and OSF can also aid reproducibility.
Understanding ECG signals in the MIMIC II databaseJiahao Chen
This document discusses understanding ECG signals in a database and building a computer model to analyze them. It notes that ECGs can reveal many heart problems and challenges include missing data, incomplete records, and abnormal data. It proposes extracting features from records and using techniques like PCA. A referenced study describes a dynamical model for generating synthetic ECG signals that models the placement and shape of peaks, breathing, and heartbeat variations. The goal is to understand normal behavior before detecting abnormalities.
Programming languages: history, relativity and designJiahao Chen
R solution: Uses dplyr functions to group the data by user ID, then calculates the average rating for each user using summarize() and mean(). Returns a tibble with user ID and average rating.
MATLAB solution: Uses accumarray() to bin the ratings by user IDs, then calculates the mean rating within each bin. Returns a vector with average ratings for each unique user ID.
APL solution: Defines vectors for user IDs and ratings. Finds unique user IDs and indexes to bin ratings. Calculates average rating within each bin by summing product of ratings and indexes, then dividing by sum of indexes. Returns vectors of average ratings and unique user
A very short tour through the Julia community and how key features of the language interact to produce an expressive syntax that users like without sacrificing performance
Resolving the dissociation catastrophe in fluctuating-charge modelsJiahao Chen
The document discusses issues that arise when using fluctuating charge models to describe chemical systems. It summarizes the concept of fluctuating charges based on electronegativity equalization. However, this leads to an unphysical "dissociation catastrophe" where charges do not decay to zero at infinite separation. The document proposes fixing this by introducing distance-dependent electronegativity or charge transfer variables between atoms to attenuate long-range charge transfer. It also discusses the topological relationship between charge transfer variables and atomic charges to convert between representations.
A brief introduction to Hartree-Fock and TDDFTJiahao Chen
The document provides an overview of time-dependent density functional theory (TDDFT) for computing molecular excited states. It begins with an introduction to the Born-Oppenheimer approximation and variational principle. It then discusses the Hartree-Fock and Kohn-Sham equations as self-consistent field methods for calculating ground states, and linear response theory for calculating excited states within TDDFT. The contents section outlines the topics to be covered, including basis functions, Hartree-Fock theory, density functional theory, and time-dependent DFT.
Excitation Energy Transfer In Photosynthetic MembranesJiahao Chen
This document summarizes research on excitation energy transfer in the light harvesting complex II (LHC-II) found in plants. It discusses how LHC-II is able to efficiently funnel light energy absorbed by chlorophyll pigments to the reaction center in picoseconds using two mechanisms: Dexter electron exchange and Förster dipole-dipole interactions. Computational modeling of LHC-II's atomic coordinates and transition dipole strengths shows that the strongest couplings between chlorophylls lead to the fastest energy transfer rates, with an overall light harvesting efficiency of 98.7% and a mean passage time of 13.52 picoseconds for excitons to travel between pigments.
1. Atomic charges are easy to define for isolated atoms or ions but difficult to quantify precisely for atoms in molecules.
2. When bonds form, electrons redistribute such that atomic charges may become fractional, with the bonding electron pair sometimes lying between or shifted toward one atomic center.
3. Mulliken developed the concept of electronegativity to describe how electrons are redistributed in molecules based on differences in atoms' abilities to attract electrons.
The document discusses the Julia programming language. It highlights that Julia bridges the gap between computer science and computational science by allowing for both data abstraction and high performance. Julia uses multiple dispatch as its core programming paradigm, which allows functions to have different implementations depending on the types of their arguments. This enables Julia to perform efficiently on a wide range of technical computing tasks.
Theory and application of fluctuating-charge modelsJiahao Chen
This document discusses fluctuating charge models, which map molecules to electrical circuits by representing atomic charges as voltages and charge-charge interactions as capacitances. It introduces the QEq model, which has problems with metallicity and incorrect charge transfer asymptotics. The document then presents the new QTPIE model, which addresses these issues by representing charges as charge transfer variables between atoms and incorporating an exponential distance-dependent attenuation of voltages. This gives QTPIE the correct charge behavior for dissociated systems.
The document discusses NumPy and SciPy, two popular Python packages for scientific computing. NumPy adds support for large, multi-dimensional arrays and matrices to Python. It also introduces data types and affords operations like linear algebra on array objects. SciPy builds on NumPy and contains modules for optimization, integration, interpolation and other tasks. Together, NumPy and SciPy provide a powerful yet easy to use environment for numerical computing in Python.
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
This document provides examples of different frameworks that can be used for machine learning data workflows, including KNIME, Python, Julia, Summingbird, Scalding, and Cascalog. It describes features of each framework such as KNIME's large number of integrations and visual workflow editing, Python's broad ecosystem, Julia's performance and parallelism support, Summingbird's ability to switch between Storm and Scalding backends, and Scalding's implementation of the Scala collections API over Cascading for compact workflow code. The document aims to familiarize readers with options for building machine learning data workflows.
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
First public meetup at Twitter Seattle, for Seattle DAML:
http://www.meetup.com/Seattle-DAML/events/159043422/
We compare/contrast several open source frameworks which have emerged for Machine Learning workflows, including KNIME, IPython Notebook and related Py libraries, Cascading, Cascalog, Scalding, Summingbird, Spark/MLbase, MBrace on .NET, etc. The analysis develops several points for "best of breed" and what features would be great to see across the board for many frameworks... leading up to a "scorecard" to help evaluate different alternatives. We also review the PMML standard for migrating predictive models, e.g., from SAS to Hadoop.
The document summarizes Lynn Cherny's work setting up a data science program at emlyon business school. It discusses the courses taught in the first year of the program and plans for the next year. It also describes a student project analyzing job postings using skills extracted from text with word embeddings to identify gaps between teaching and job requirements. Ideas are proposed for improving the curriculum and student job searches.
Knowledge Graphs have proven to be extremely valuable to rec-
ommender systems, as they enable hybrid graph-based recommen-
dation models encompassing both collaborative and content infor-
mation. Leveraging this wealth of heterogeneous information for
top-N item recommendation is a challenging task, as it requires the
ability of effectively encoding a diversity of semantic relations and
connectivity patterns. In this work, we propose entity2rec, a novel
approach to learning user-item relatedness from knowledge graphs
for top-N item recommendation. We start from a knowledge graph
modeling user-item and item-item relations and we learn property-
specific vector representations of users and items applying neural
language models on the network. These representations are used
to create property-specific user-item relatedness features, which
are in turn fed into learning to rank algorithms to learn a global
relatedness model that optimizes top-N item recommendations. We
evaluate the proposed approach in terms of ranking quality on
the MovieLens 1M dataset, outperforming a number of state-of-
the-art recommender systems, and we assess the importance of
property-specific relatedness scores on the overall ranking quality.
This document discusses last week's coding exercise on data harvesting and storage. It provides step-by-step explanations of code used to extract and analyze data from a JSON file. Examples include printing video titles, calculating average tags per video, determining the most commented on porn category, and finding the most frequently used words. The document also covers APIs, scrapers, file formats like JSON and CSV, and how to store extracted data.
Transfer Learning -- The Next Frontier for Machine LearningSebastian Ruder
Sebastian Ruder gave a presentation on transfer learning in machine learning. He began by defining transfer learning as applying knowledge gained from solving one problem to a different but related problem. Transfer learning is now important because machine learning models have matured and are being widely deployed, but often lack labeled data for new tasks or domains. Ruder discussed examples of transfer learning in computer vision and natural language processing. He described his research focus on finding better ways to transfer knowledge between domains, tasks, and languages in large-scale, real-world applications.
This document discusses a lecture on data harvesting and storage. It covers APIs, RSS feeds, scraping and crawling as methods for collecting data from various sources. It also discusses storing data in formats like CSV, JSON, and XML. The document provides code examples for working with JSON data and discusses tools for long-term data collection like DMI-TCAT.
Robot Localisation: An Introduction - Luis Contreras 2020.06.09 | RoboCup@Hom...robocupathomeedu
The document summarizes Luis Contreras' upcoming lecture on robot localization using particle filters. The key points covered are:
1. Robot localization is the process of determining a robot's pose (position and orientation) over time using motion and sensor measurements within a map.
2. Particle filters represent the robot's uncertain pose as a set of weighted particles, with each particle being a hypothesis of the robot's state.
3. As the robot moves and senses its environment, the particles are propagated and weighted according to the motion and sensor models to estimate the posterior probability distribution over poses.
Open & reproducible research - What can we do in practice?Felix Z. Hoffmann
- There is a reproducibility crisis in computational research even when code is made available. Out of 206 computational studies in Science magazine since a policy change mandating sharing, only 26 directly provided their code and data. Of those judged potentially reproducible when code was available, more than half still required significant effort to reproduce.
- Making research fully reproducible requires addressing issues like difficult computational environments, long run times, dependency on previous results, and clarity on what is required to reproduce a single finding. Following principles like ensuring code is re-runnable, repeatable, reproducible, reusable, and replicable can help achieve reproducibility. Publishing code on platforms like Zenodo and OSF can also aid reproducibility.
Understanding ECG signals in the MIMIC II databaseJiahao Chen
This document discusses understanding ECG signals in a database and building a computer model to analyze them. It notes that ECGs can reveal many heart problems and challenges include missing data, incomplete records, and abnormal data. It proposes extracting features from records and using techniques like PCA. A referenced study describes a dynamical model for generating synthetic ECG signals that models the placement and shape of peaks, breathing, and heartbeat variations. The goal is to understand normal behavior before detecting abnormalities.
Programming languages: history, relativity and designJiahao Chen
R solution: Uses dplyr functions to group the data by user ID, then calculates the average rating for each user using summarize() and mean(). Returns a tibble with user ID and average rating.
MATLAB solution: Uses accumarray() to bin the ratings by user IDs, then calculates the mean rating within each bin. Returns a vector with average ratings for each unique user ID.
APL solution: Defines vectors for user IDs and ratings. Finds unique user IDs and indexes to bin ratings. Calculates average rating within each bin by summing product of ratings and indexes, then dividing by sum of indexes. Returns vectors of average ratings and unique user
A very short tour through the Julia community and how key features of the language interact to produce an expressive syntax that users like without sacrificing performance
Resolving the dissociation catastrophe in fluctuating-charge modelsJiahao Chen
The document discusses issues that arise when using fluctuating charge models to describe chemical systems. It summarizes the concept of fluctuating charges based on electronegativity equalization. However, this leads to an unphysical "dissociation catastrophe" where charges do not decay to zero at infinite separation. The document proposes fixing this by introducing distance-dependent electronegativity or charge transfer variables between atoms to attenuate long-range charge transfer. It also discusses the topological relationship between charge transfer variables and atomic charges to convert between representations.
A brief introduction to Hartree-Fock and TDDFTJiahao Chen
The document provides an overview of time-dependent density functional theory (TDDFT) for computing molecular excited states. It begins with an introduction to the Born-Oppenheimer approximation and variational principle. It then discusses the Hartree-Fock and Kohn-Sham equations as self-consistent field methods for calculating ground states, and linear response theory for calculating excited states within TDDFT. The contents section outlines the topics to be covered, including basis functions, Hartree-Fock theory, density functional theory, and time-dependent DFT.
Excitation Energy Transfer In Photosynthetic MembranesJiahao Chen
This document summarizes research on excitation energy transfer in the light harvesting complex II (LHC-II) found in plants. It discusses how LHC-II is able to efficiently funnel light energy absorbed by chlorophyll pigments to the reaction center in picoseconds using two mechanisms: Dexter electron exchange and Förster dipole-dipole interactions. Computational modeling of LHC-II's atomic coordinates and transition dipole strengths shows that the strongest couplings between chlorophylls lead to the fastest energy transfer rates, with an overall light harvesting efficiency of 98.7% and a mean passage time of 13.52 picoseconds for excitons to travel between pigments.
1. Atomic charges are easy to define for isolated atoms or ions but difficult to quantify precisely for atoms in molecules.
2. When bonds form, electrons redistribute such that atomic charges may become fractional, with the bonding electron pair sometimes lying between or shifted toward one atomic center.
3. Mulliken developed the concept of electronegativity to describe how electrons are redistributed in molecules based on differences in atoms' abilities to attract electrons.
The document discusses the Julia programming language. It highlights that Julia bridges the gap between computer science and computational science by allowing for both data abstraction and high performance. Julia uses multiple dispatch as its core programming paradigm, which allows functions to have different implementations depending on the types of their arguments. This enables Julia to perform efficiently on a wide range of technical computing tasks.
Theory and application of fluctuating-charge modelsJiahao Chen
This document discusses fluctuating charge models, which map molecules to electrical circuits by representing atomic charges as voltages and charge-charge interactions as capacitances. It introduces the QEq model, which has problems with metallicity and incorrect charge transfer asymptotics. The document then presents the new QTPIE model, which addresses these issues by representing charges as charge transfer variables between atoms and incorporating an exponential distance-dependent attenuation of voltages. This gives QTPIE the correct charge behavior for dissociated systems.
The document discusses NumPy and SciPy, two popular Python packages for scientific computing. NumPy adds support for large, multi-dimensional arrays and matrices to Python. It also introduces data types and affords operations like linear algebra on array objects. SciPy builds on NumPy and contains modules for optimization, integration, interpolation and other tasks. Together, NumPy and SciPy provide a powerful yet easy to use environment for numerical computing in Python.
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
This document provides examples of different frameworks that can be used for machine learning data workflows, including KNIME, Python, Julia, Summingbird, Scalding, and Cascalog. It describes features of each framework such as KNIME's large number of integrations and visual workflow editing, Python's broad ecosystem, Julia's performance and parallelism support, Summingbird's ability to switch between Storm and Scalding backends, and Scalding's implementation of the Scala collections API over Cascading for compact workflow code. The document aims to familiarize readers with options for building machine learning data workflows.
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
First public meetup at Twitter Seattle, for Seattle DAML:
http://www.meetup.com/Seattle-DAML/events/159043422/
We compare/contrast several open source frameworks which have emerged for Machine Learning workflows, including KNIME, IPython Notebook and related Py libraries, Cascading, Cascalog, Scalding, Summingbird, Spark/MLbase, MBrace on .NET, etc. The analysis develops several points for "best of breed" and what features would be great to see across the board for many frameworks... leading up to a "scorecard" to help evaluate different alternatives. We also review the PMML standard for migrating predictive models, e.g., from SAS to Hadoop.
The document summarizes Lynn Cherny's work setting up a data science program at emlyon business school. It discusses the courses taught in the first year of the program and plans for the next year. It also describes a student project analyzing job postings using skills extracted from text with word embeddings to identify gaps between teaching and job requirements. Ideas are proposed for improving the curriculum and student job searches.
This is a presentation I did for the new interns at Duo Software which I highlight the pros and cons of being creative and following widely used best practices in software development
This document provides an overview of getting started with data science using Python. It discusses what data science is, why it is in high demand, and the typical skills and backgrounds of data scientists. It then covers popular Python libraries for data science like NumPy, Pandas, Scikit-Learn, TensorFlow, and Keras. Common data science steps are outlined including data gathering, preparation, exploration, model building, validation, and deployment. Example applications and case studies are discussed along with resources for learning including podcasts, websites, communities, books, and TV shows.
The document describes an ontology called Exposé that was created for machine learning experimentation. The ontology aims to formally represent key aspects of machine learning experiments such as algorithm specifications, implementations, applications, experimental contexts, evaluation functions, and structured data. Exposé builds on and extends existing ontologies for data mining and machine learning experimentation by incorporating classes and relationships to represent additional important concepts.
This document provides an overview of object-oriented programming concepts. It discusses what OOP is, the history and goals of OOP, and key concepts like objects, classes, interfaces, encapsulation, inheritance, and polymorphism. Specifically, it explains that OOP evolved from procedural programming to further abstract concepts through objects that contain both data and behaviors. It also discusses how encapsulation, inheritance, and polymorphism are the three main principles of OOP that help make software more comprehensible, maintainable, and reusable.
Helping Students to Learn Matehmatics Beyond LMSMartin Homik
This document summarizes the ActiveMath learning environment, its goals and features. It discusses how ActiveMath uses artificial intelligence techniques like user modeling, adaptive interfaces and feedback to personalize learning. It also describes how ActiveMath represents knowledge semantically and generates courses by retrieving reusable content. Finally, it discusses how integrating ActiveMath with the Sakai learning management system provides user management and a way to make ActiveMath's tools and open content available within Sakai.
Interactive Visualizations for teaching, research, and disseminationScott A. Hale
This document discusses the development of interactive data visualizations to allow non-technical researchers to more easily explore complex datasets. It notes the limitations of static images and outlines an open-source project to create browser-based tools for networks and maps. The project uses standards-compliant code and common data formats to produce interactive visualizations for teaching, research and dissemination.
This document introduces khmer, a platform for scalable sequence analysis. It discusses how khmer uses k-mers to provide implicit read alignments and assemble sequences using de Bruijn graphs. It also describes some of the challenges with k-mers, such as each sequencing error resulting in novel k-mers. The document outlines khmer's data structures and algorithms for efficiently counting k-mers and represents de Bruijn graphs. It discusses how khmer has been applied to real biological problems and highlights areas of current research using khmer, such as error correction, variant calling, and assembly-free comparisons of data sets.
Knowledge Infrastructure for Global Systems ScienceDavid De Roure
Presentation at the First Open Global Systems Science Conference, Brussels, 8-10 November 2012
http://www.gsdp.eu/nc/news/news/date/2012/10/31/first-open-global-systems-science-conference/
This document summarizes Andre Freitas' talk on AI beyond deep learning. It discusses representing meaning from text at scale using knowledge graphs and embeddings. It also covers using neuro-symbolic models like graph networks on top of knowledge graphs to enable few-shot learning, explainability, and transportability. The document advocates that AI engineers should focus on representation design and evaluating multi-component NLP systems.
This document provides an overview and introduction to GeoGebra, a free dynamic mathematics software for learning and teaching geometry, algebra, and calculus. It can be installed from their website or downloaded on a USB drive. GeoGebra combines elements of dynamic geometry, computer algebra, and spreadsheets. It is used as a pedagogical tool for visualizing mathematical concepts, creating multiple representations, and allowing students to experiment and discover mathematics on their own. GeoGebra has an active international user community who contribute materials and help expand its capabilities through free and open-source development.
Afternoons with Azure - Azure Machine Learning CCG
Journey through programming languages such as R, and Python that can be used for Machine Learning. Next, explore Azure Machine Learning Studio see the interconnectivity.
For more information about Microsoft Azure, call (813) 265-3239 or visit www.ccganalytics.com/solutions
This document discusses openness and reproducibility in computational science. It begins with an introduction and background on the challenges of analyzing non-model organisms. It then describes the goals and challenges of shotgun sequencing analysis, including assembly, counting, and variant calling. It emphasizes the need for efficient data structures, algorithms, and cloud-based analysis to handle large datasets. The document advocates for open science practices like publishing code, data, and analyses to ensure reproducibility of computational results.
Introduction to Prolog (PROramming in LOGic)Ahmed Gad
As part of artificial intelligence course given in faculty of computers and information, Prolog was the first tool to make intelligent decisions like making relations between different objects.
Prolog has a strong history in AI starting in 1972 as a logic programming language that solves problems by logic.
Prolog is a general-purpose logic programming language associated with artificial intelligence and computational linguistics. Prolog has its roots in first-order logic, a formal logic, and unlike many other programming languages, Prolog is declarative: the program logic is expressed in terms of relations, represented as facts and rules. A computation is initiated by running a query over these relations. The language was first conceived by a group around Alain Colmerauer in Marseille, France, in the early 1970s and the first Prolog system was developed in 1972 by Colmerauer with Philippe Roussel. Prolog was one of the first logic programming languages, and remains the most popular among such languages today, with several free and commercial implementations available. The language has been used for theorem proving, expert systems, type inference systems, and automated planning, as well as its original intended field of use, natural language processing. Modern Prolog environments support creating graphical user interfaces, as well as administrative and networked applications. Prolog is well-suited for specific tasks that benefit from rule-based logical queries such as searching databases, voice control systems, and filling templates.
Find me on:
AFCIT
http://www.afcit.xyz
YouTube
https://www.youtube.com/channel/UCuewOYbBXH5gwhfOrQOZOdw
Google Plus
https://plus.google.com/u/0/+AhmedGadIT
SlideShare
https://www.slideshare.net/AhmedGadFCIT
LinkedIn
https://www.linkedin.com/in/ahmedfgad/
ResearchGate
https://www.researchgate.net/profile/Ahmed_Gad13
Academia
https://www.academia.edu/
Google Scholar
https://scholar.google.com.eg/citations?user=r07tjocAAAAJ&hl=en
Mendelay
https://www.mendeley.com/profiles/ahmed-gad12/
ORCID
https://orcid.org/0000-0003-1978-8574
StackOverFlow
http://stackoverflow.com/users/5426539/ahmed-gad
Twitter
https://twitter.com/ahmedfgad
Facebook
https://www.facebook.com/ahmed.f.gadd
Pinterest
https://www.pinterest.com/ahmedfgad/
This document provides guidance on how to become a competent data professional. It discusses the various types of data careers and skills required, including problem solving, statistics, programming, communication and business skills. It recommends taking online courses and finding a mentor, as well as gaining hands-on experience through competitions like Kaggle. With 5-6 years of consistent practice spending several hours per day learning, one can become competent in data skills. The document also addresses common questions for beginners and provides tips for progression in a data career.
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Project Management Semester Long Project - Acuityjpupo2018
Acuity is an innovative learning app designed to transform the way you engage with knowledge. Powered by AI technology, Acuity takes complex topics and distills them into concise, interactive summaries that are easy to read & understand. Whether you're exploring the depths of quantum mechanics or seeking insight into historical events, Acuity provides the key information you need without the burden of lengthy texts.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
4. What’s the big deal about Julia ?
A high level language
with C-like speed
julialang.org/benchmarks
5. It bridges the divide between computer science
and computational science
What’s the big deal about Julia ?
6. It bridges the divide between computer science
and computational science
What’s the big deal about Julia ?
data abstraction
performance
7. It bridges the divide between computer science
and computational science
What’s the big deal about Julia ?
data abstraction
performance
What if you didn’t have to choose between
data abstraction and performance?
15. methods objects
Object-oriented programming with classes
What can I do with/to a thing?
top up
pay fare
lose
buy
classes are more
fundamental
than methods
top up
pay fare
lose
buy
pay fare
lose
buy
16. OOP with classes multi-methods
What can I do with/to a thing?
top up
pay fare
lose
buy
generic
function
objectsmethods
multimethods
relationships between
objects and functions
17. Multi-methods for linear algebra
What can I do with/to a thing?
compute spectral factorization
compute singular values
compute singular values and vectors
compute eigenvalues
generic
function
objectsmethods
Methods can take advantage of special matrix structures
eigvals
eigfact
svdvals
svdfact
Matrix
SymTridiagonal
Bidiagonal
18.
19.
20.
21.
22.
23.
24. easy to call external C
functions, e.g. CLAPACK
sstev, dstev…
25. So how does this help us with linear algebra?
Multi-method dispatch with generic fallbacks
Matrix operations on general rings
26. So how does this help us with linear algebra?
Multi-method dispatch with generic fallbacks
Matrix operations on general rings textbook algorithm
27. So how does this help us with linear algebra?
Multi-method dispatch with generic fallbacks
Matrix operations on general rings
33. Summary
Types allow users to express scientific computations
and the compiler to specialize code for performance
Other advanced features for performance: code generation,
native parallel computing, …
MS246: High-level Technical Computing with Julia
4:25 PM - 6:05 PM in Room 254 B