Technical computing in Julia

This document introduces Julia, an open-source programming language for technical computing with features like high performance, multiple dispatch, and ease of use. It has over 500 packages and 700 contributors. Julia bridges the gap between computer science and computational science by allowing for both high-level programming and high performance via just-in-time compilation. Its type system helps express scientific computations and improves performance. Other features include object-oriented programming with multiple dispatch, native parallelism, and domain-specific languages like JuMP for optimization. Julia has comparable or better performance than other numerical computing languages.

Julia? why a new language, an an application to genomics data analysis

The document introduces Julia, a new programming language designed to be understandable by both programmers and compilers, allowing for high performance without sacrificing usability or flexibility. It discusses Julia's just-in-time compilation approach and how its design aims to overcome limitations of other languages like MATLAB, R, and Python. The document also provides an overview of Julia's community and ecosystem including package authors and contributors.

A Julia package for iterative SVDs with applications to genomics data analysis

This document discusses a Julia package for iterative singular value decompositions (SVDs) with applications to genomics data analysis. It introduces SVD and its use in genome-wide association studies to identify genetic factors associated with diseases and traits. It summarizes different iterative SVD methods like Lanczos iteration and challenges like loss of orthogonality. The document presents a new Julia package called FlashPCA that uses a blocked power method to quickly approximate the top SVD components, and compares its performance to other iterative SVD solvers on large genomic datasets.

Genomics data analysis in Julia

This document summarizes work done by the Julia Labs at MIT on genomics data analysis and optimization of principal component analysis algorithms for genome-wide association studies. It describes how a native Julia implementation of PCA was able to reduce computation time for finding the top 10 principal components of a 80,000x40,000 genotype matrix from over 2,900 seconds to just 81 seconds. It also discusses how custom matrix-vector multiplication functions allowed the same computation speed while using 32x less memory by reading directly from a compressed data format. Future work directions include more complex analytics, improved data imputation methods, out-of-core matrix operations, and accessing different data formats.

Julia, genomics data and their principal components

Department of Communication Science, University of Amsterdam

This document summarizes the average ratings that different users have given to an Uber driver based on the user IDs and ratings provided. It shows solutions to calculating the average ratings in R, MATLAB, APL, C, and Julia. The solutions demonstrate different programming paradigms and language features like data abstraction, higher-order functions, vectorization, multiple dispatch, and dynamic typing in each language.

Meta learning tutorial

Joaquin Vanschoren

Learning how to learn

Joaquin Vanschoren

The document discusses different approaches to meta-learning, or learning to learn. It begins by explaining how humans are able to learn new tasks more quickly by leveraging prior knowledge from similar tasks. It then covers three main approaches to meta-learning for machine learning models: 1) Starting with what generally works based on previous task performance data, 2) Starting from what is most likely to work for similar tasks based on task meta-features, and 3) Starting from previously trained models on very similar tasks via transfer learning. The document dives into various techniques within each of these three approaches, such as warm-starting optimization searches, learning task embeddings, and few-shot learning.

Knowledge Graphs have proven to be extremely valuable to recommender systems, as they enable hybrid graph-based recommendation models encompassing both collaborative and content information. Leveraging this wealth of heterogeneous information for top-N item recommendation is a challenging task, as it requires the ability of effectively encoding a diversity of semantic relations and connectivity patterns. In this work, we propose entity2rec, a novel approach to learning user-item relatedness from knowledge graphs for top-N item recommendation. We start from a knowledge graph modeling user-item and item-item relations and we learn property- specific vector representations of users and items applying neural language models on the network. These representations are used to create property-specific user-item relatedness features, which are in turn fed into learning to rank algorithms to learn a global relatedness model that optimizes top-N item recommendations. We evaluate the proposed approach in terms of ranking quality on the MovieLens 1M dataset, outperforming a number of state-of- the-art recommender systems, and we assess the importance of property-specific relatedness scores on the overall ranking quality.

BDACA - Lecture3

This document discusses last week's coding exercise on data harvesting and storage. It provides step-by-step explanations of code used to extract and analyze data from a JSON file. Examples include printing video titles, calculating average tags per video, determining the most commented on porn category, and finding the most frequently used words. The document also covers APIs, scrapers, file formats like JSON and CSV, and how to store extracted data.

Transfer Learning -- The Next Frontier for Machine Learning

Sebastian Ruder

Sebastian Ruder gave a presentation on transfer learning in machine learning. He began by defining transfer learning as applying knowledge gained from solving one problem to a different but related problem. Transfer learning is now important because machine learning models have matured and are being widely deployed, but often lack labeled data for new tasks or domains. Ruder discussed examples of transfer learning in computer vision and natural language processing. He described his research focus on finding better ways to transfer knowledge between domains, tasks, and languages in large-scale, real-world applications.

BDACA1617s2 - Lecture3

Department of Communication Science, University of Amsterdam

An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation

Department of Communication Science, University of Amsterdam

Robot Localisation: An Introduction - Luis Contreras 2020.06.09 | RoboCup@Hom...

robocupathomeedu

The document summarizes Luis Contreras' upcoming lecture on robot localization using particle filters. The key points covered are: 1. Robot localization is the process of determining a robot's pose (position and orientation) over time using motion and sensor measurements within a map. 2. Particle filters represent the robot's uncertain pose as a set of weighted particles, with each particle being a hypothesis of the robot's state. 3. As the robot moves and senses its environment, the particles are propagated and weighted according to the motion and sensor models to estimate the posterior probability distribution over poses.

BDACA - Lecture4

Knowledge Graph Embeddings for Recommender Systems

Open & reproducible research - What can we do in practice?

Felix Z. Hoffmann

- There is a reproducibility crisis in computational research even when code is made available. Out of 206 computational studies in Science magazine since a policy change mandating sharing, only 26 directly provided their code and data. Of those judged potentially reproducible when code was available, more than half still required significant effort to reproduce. - Making research fully reproducible requires addressing issues like difficult computational environments, long run times, dependency on previous results, and clarity on what is required to reproduce a single finding. Following principles like ensuring code is re-runnable, repeatable, reproducible, reusable, and replicable can help achieve reproducibility. Publishing code on platforms like Zenodo and OSF can also aid reproducibility.

Understanding ECG signals in the MIMIC II database

This document discusses understanding ECG signals in a database and building a computer model to analyze them. It notes that ECGs can reveal many heart problems and challenges include missing data, incomplete records, and abnormal data. It proposes extracting features from records and using techniques like PCA. A referenced study describes a dynamical model for generating synthetic ECG signals that models the placement and shape of peaks, breathing, and heartbeat variations. The goal is to understand normal behavior before detecting abnormalities.

Programming languages: history, relativity and design

R solution: Uses dplyr functions to group the data by user ID, then calculates the average rating for each user using summarize() and mean(). Returns a tibble with user ID and average rating. MATLAB solution: Uses accumarray() to bin the ratings by user IDs, then calculates the mean rating within each bin. Returns a vector with average ratings for each unique user ID. APL solution: Defines vectors for user IDs and ratings. Finds unique user IDs and indexes to bin ratings. Calculates average rating within each bin by summing product of ratings and indexes, then dividing by sum of indexes. Returns vectors of average ratings and unique user

Julia: compiler and community

Resolving the dissociation catastrophe in fluctuating-charge models

The document discusses issues that arise when using fluctuating charge models to describe chemical systems. It summarizes the concept of fluctuating charges based on electronegativity equalization. However, this leads to an unphysical "dissociation catastrophe" where charges do not decay to zero at infinite separation. The document proposes fixing this by introducing distance-dependent electronegativity or charge transfer variables between atoms to attenuate long-range charge transfer. It also discusses the topological relationship between charge transfer variables and atomic charges to convert between representations.

A brief introduction to Hartree-Fock and TDDFT

The document provides an overview of time-dependent density functional theory (TDDFT) for computing molecular excited states. It begins with an introduction to the Born-Oppenheimer approximation and variational principle. It then discusses the Hartree-Fock and Kohn-Sham equations as self-consistent field methods for calculating ground states, and linear response theory for calculating excited states within TDDFT. The contents section outlines the topics to be covered, including basis functions, Hartree-Fock theory, density functional theory, and time-dependent DFT.

Excitation Energy Transfer In Photosynthetic Membranes

This document summarizes research on excitation energy transfer in the light harvesting complex II (LHC-II) found in plants. It discusses how LHC-II is able to efficiently funnel light energy absorbed by chlorophyll pigments to the reaction center in picoseconds using two mechanisms: Dexter electron exchange and Förster dipole-dipole interactions. Computational modeling of LHC-II's atomic coordinates and transition dipole strengths shows that the strongest couplings between chlorophylls lead to the fastest energy transfer rates, with an overall light harvesting efficiency of 98.7% and a mean passage time of 13.52 picoseconds for excitons to travel between pigments.

Group meeting 3/11 - sticky electrons

1. Atomic charges are easy to define for isolated atoms or ions but difficult to quantify precisely for atoms in molecules. 2. When bonds form, electrons redistribute such that atomic charges may become fractional, with the bonding electron pair sometimes lying between or shifted toward one atomic center. 3. Mulliken developed the concept of electronegativity to describe how electrons are redistributed in molecules based on differences in atoms' abilities to attract electrons.

What's next in Julia

The document discusses the Julia programming language. It highlights that Julia bridges the gap between computer science and computational science by allowing for both data abstraction and high performance. Julia uses multiple dispatch as its core programming paradigm, which allows functions to have different implementations depending on the types of their arguments. This enables Julia to perform efficiently on a wide range of technical computing tasks.

Theory and application of fluctuating-charge models

This document discusses fluctuating charge models, which map molecules to electrical circuits by representing atomic charges as voltages and charge-charge interactions as capacitances. It introduces the QEq model, which has problems with metallicity and incorrect charge transfer asymptotics. The document then presents the new QTPIE model, which addresses these issues by representing charges as charge transfer variables between atoms and incorporating an exponential distance-dependent attenuation of voltages. This gives QTPIE the correct charge behavior for dissociated systems.

Python as number crunching code glue

The document discusses NumPy and SciPy, two popular Python packages for scientific computing. NumPy adds support for large, multi-dimensional arrays and matrices to Python. It also introduces data types and affords operations like linear algebra on array objects. SciPy builds on NumPy and contains modules for optimization, integration, interpolation and other tasks. Together, NumPy and SciPy provide a powerful yet easy to use environment for numerical computing in Python.

OSCON 2014: Data Workflows for Machine Learning

This document provides examples of different frameworks that can be used for machine learning data workflows, including KNIME, Python, Julia, Summingbird, Scalding, and Cascalog. It describes features of each framework such as KNIME's large number of integrations and visual workflow editing, Python's broad ecosystem, Julia's performance and parallelism support, Summingbird's ability to switch between Storm and Scalding backends, and Scalding's implementation of the Scala collections API over Cascading for compact workflow code. The document aims to familiarize readers with options for building machine learning data workflows.

Data Workflows for Machine Learning - Seattle DAML

Pôle Systematic Paris-Region

First public meetup at Twitter Seattle, for Seattle DAML: http://www.meetup.com/Seattle-DAML/events/159043422/ We compare/contrast several open source frameworks which have emerged for Machine Learning workflows, including KNIME, IPython Notebook and related Py libraries, Cascading, Cascalog, Scalding, Summingbird, Spark/MLbase, MBrace on .NET, etc. The analysis develops several points for "best of breed" and what features would be great to see across the board for many frameworks... leading up to a "scorecard" to help evaluate different alternatives. We also review the PMML standard for migrating predictive models, e.g., from SAS to Hadoop.

Exploring French Job Ads, Lynn Cherny

The document summarizes Lynn Cherny's work setting up a data science program at emlyon business school. It discusses the courses taught in the first year of the program and plans for the next year. It also describes a student project analyzing job postings using skills extracted from text with word embeddings to identify gaps between teaching and job requirements. Ideas are proposed for improving the curriculum and student job searches.

What's hot

Entity2rec recsys

Department of Communication Science, University of Amsterdam

BDACA - Lecture3

Transfer Learning -- The Next Frontier for Machine Learning

Sebastian Ruder

BDACA1617s2 - Lecture3

Department of Communication Science, University of Amsterdam

An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation

Department of Communication Science, University of Amsterdam

Robot Localisation: An Introduction - Luis Contreras 2020.06.09 | RoboCup@Hom...

robocupathomeedu

BDACA - Lecture4

Knowledge Graph Embeddings for Recommender Systems

Open & reproducible research - What can we do in practice?

Felix Z. Hoffmann

What's hot (9)

Entity2rec recsys

BDACA - Lecture3

Transfer Learning -- The Next Frontier for Machine Learning

BDACA1617s2 - Lecture3

An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation

Robot Localisation: An Introduction - Luis Contreras 2020.06.09 | RoboCup@Hom...

BDACA - Lecture4

Knowledge Graph Embeddings for Recommender Systems

Open & reproducible research - What can we do in practice?

Viewers also liked

Understanding ECG signals in the MIMIC II database

Programming languages: history, relativity and design

Julia: compiler and community

Resolving the dissociation catastrophe in fluctuating-charge models

A brief introduction to Hartree-Fock and TDDFT

Excitation Energy Transfer In Photosynthetic Membranes

Group meeting 3/11 - sticky electrons

What's next in Julia

Theory and application of fluctuating-charge models

Python as number crunching code glue

Viewers also liked (10)

Understanding ECG signals in the MIMIC II database

Programming languages: history, relativity and design

Julia: compiler and community

Resolving the dissociation catastrophe in fluctuating-charge models

A brief introduction to Hartree-Fock and TDDFT

Excitation Energy Transfer In Photosynthetic Membranes

Group meeting 3/11 - sticky electrons

What's next in Julia

Theory and application of fluctuating-charge models

Python as number crunching code glue

Similar to Technical computing in Julia

OSCON 2014: Data Workflows for Machine Learning

Data Workflows for Machine Learning - Seattle DAML

Pôle Systematic Paris-Region

Exploring French Job Ads, Lynn Cherny

Data Workflows for Machine Learning - SF Bay Area ML