Eclipse Science provides 10 projects related to scientific computing including projects for visualization, chemistry analysis, data analysis, numerical data manipulation, user interfaces for large datasets, scanning instruments, R programming, LaTeX documents, scientific workflows, and an integrated computational environment. The projects involve various companies and organizations and aim to support scientific domains through open source Eclipse technologies and APIs.
WhizzML is a domain-specific language for automating Machine Learning workflows, implement high-level Machine Learning algorithms, and easily share them with others. WhizzML offers out-of-the-box scalability, abstracts away the complexity of underlying infrastructure, and helps analysts, developers, and scientists reduce the burden of repetitive and time-consuming analytics tasks.
WhizzML is a domain-specific language for automating Machine Learning workflows, implement high-level Machine Learning algorithms, and easily share them with others. WhizzML offers out-of-the-box scalability, abstracts away the complexity of underlying infrastructure, and helps analysts, developers, and scientists reduce the burden of repetitive and time-consuming analytics tasks.
WhizzML is a domain-specific language for automating Machine Learning workflows, implement high-level Machine Learning algorithms, and easily share them with others. WhizzML offers out-of-the-box scalability, abstracts away the complexity of underlying infrastructure, and helps analysts, developers, and scientists reduce the burden of repetitive and time-consuming analytics tasks.
Agile Data Science: Hadoop Analytics ApplicationsRussell Jurney
This document provides instructions and examples for analyzing and visualizing event data in an agile manner. It discusses loading event data stored in Avro format using tools like Pig and displaying the data in a browser. Specific steps outlined include using Cat to view Avro data, loading the data into Pig and using Illustrate to view sample records. The overall approach emphasized is to work with atomic event data in an iterative way using Pig and other Hadoop tools to explore and visualize the data.
This document discusses building agile analytics applications. It recommends taking an iterative approach where data is explored interactively from the start to discover insights. Rather than designing insights upfront, the goal is to build an application that facilitates exploration of the data to uncover insights. This is done by setting up an environment where insights can be repeatedly produced and shared with the team. The focus is on using simple, flexible tools that work from small local data to large datasets.
Agile Data Science: Building Hadoop Analytics ApplicationsRussell Jurney
This document discusses building agile analytics applications with Hadoop. It outlines several principles for developing data science teams and applications in an agile manner. Some key points include:
- Data science teams should be small, around 3-4 people with diverse skills who can work collaboratively.
- Insights should be discovered through an iterative process of exploring data in an interactive web application, rather than trying to predict outcomes upfront.
- The application should start as a tool for exploring data and discovering insights, which then becomes the palette for what is shipped.
- Data should be stored in a document format like Avro or JSON rather than a relational format to reduce joins and better represent semi-structured
Agile Data Science 2.0 (O'Reilly 2017) defines a methodology and a software stack with which to apply the methods. *The methodology* seeks to deliver data products in short sprints by going meta and putting the focus on the applied research process itself. *The stack* is but an example of one meeting the requirements that it be utterly scalable and utterly efficient in use by application developers as well as data engineers. It includes everything needed to build a full-blown predictive system: Apache Spark, Apache Kafka, Apache Incubating Airflow, MongoDB, ElasticSearch, Apache Parquet, Python/Flask, JQuery. This talk will cover the full lifecycle of large data application development and will show how to use lessons from agile software engineering to apply data science using this full-stack to build better analytics applications. The entire lifecycle of big data application development is discussed. The system starts with plumbing, moving on to data tables, charts and search, through interactive reports, and building towards predictions in both batch and realtime (and defining the role for both), the deployment of predictive systems and how to iteratively improve predictions that prove valuable.
This document provides an overview of Java collections APIs, including:
- A history of collections interfaces added in different JDK versions from 1.0 to 1.7.
- Descriptions of common collection interfaces like List, Set, Map and their implementations.
- Comparisons of performance and characteristics of different collection implementations.
- Explanations of common collection algorithms and concurrency utilities.
- References for further reading on collections and concurrency.
WhizzML is a domain-specific language for automating Machine Learning workflows, implement high-level Machine Learning algorithms, and easily share them with others. WhizzML offers out-of-the-box scalability, abstracts away the complexity of underlying infrastructure, and helps analysts, developers, and scientists reduce the burden of repetitive and time-consuming analytics tasks.
WhizzML is a domain-specific language for automating Machine Learning workflows, implement high-level Machine Learning algorithms, and easily share them with others. WhizzML offers out-of-the-box scalability, abstracts away the complexity of underlying infrastructure, and helps analysts, developers, and scientists reduce the burden of repetitive and time-consuming analytics tasks.
WhizzML is a domain-specific language for automating Machine Learning workflows, implement high-level Machine Learning algorithms, and easily share them with others. WhizzML offers out-of-the-box scalability, abstracts away the complexity of underlying infrastructure, and helps analysts, developers, and scientists reduce the burden of repetitive and time-consuming analytics tasks.
Agile Data Science: Hadoop Analytics ApplicationsRussell Jurney
This document provides instructions and examples for analyzing and visualizing event data in an agile manner. It discusses loading event data stored in Avro format using tools like Pig and displaying the data in a browser. Specific steps outlined include using Cat to view Avro data, loading the data into Pig and using Illustrate to view sample records. The overall approach emphasized is to work with atomic event data in an iterative way using Pig and other Hadoop tools to explore and visualize the data.
This document discusses building agile analytics applications. It recommends taking an iterative approach where data is explored interactively from the start to discover insights. Rather than designing insights upfront, the goal is to build an application that facilitates exploration of the data to uncover insights. This is done by setting up an environment where insights can be repeatedly produced and shared with the team. The focus is on using simple, flexible tools that work from small local data to large datasets.
Agile Data Science: Building Hadoop Analytics ApplicationsRussell Jurney
This document discusses building agile analytics applications with Hadoop. It outlines several principles for developing data science teams and applications in an agile manner. Some key points include:
- Data science teams should be small, around 3-4 people with diverse skills who can work collaboratively.
- Insights should be discovered through an iterative process of exploring data in an interactive web application, rather than trying to predict outcomes upfront.
- The application should start as a tool for exploring data and discovering insights, which then becomes the palette for what is shipped.
- Data should be stored in a document format like Avro or JSON rather than a relational format to reduce joins and better represent semi-structured
Agile Data Science 2.0 (O'Reilly 2017) defines a methodology and a software stack with which to apply the methods. *The methodology* seeks to deliver data products in short sprints by going meta and putting the focus on the applied research process itself. *The stack* is but an example of one meeting the requirements that it be utterly scalable and utterly efficient in use by application developers as well as data engineers. It includes everything needed to build a full-blown predictive system: Apache Spark, Apache Kafka, Apache Incubating Airflow, MongoDB, ElasticSearch, Apache Parquet, Python/Flask, JQuery. This talk will cover the full lifecycle of large data application development and will show how to use lessons from agile software engineering to apply data science using this full-stack to build better analytics applications. The entire lifecycle of big data application development is discussed. The system starts with plumbing, moving on to data tables, charts and search, through interactive reports, and building towards predictions in both batch and realtime (and defining the role for both), the deployment of predictive systems and how to iteratively improve predictions that prove valuable.
This document provides an overview of Java collections APIs, including:
- A history of collections interfaces added in different JDK versions from 1.0 to 1.7.
- Descriptions of common collection interfaces like List, Set, Map and their implementations.
- Comparisons of performance and characteristics of different collection implementations.
- Explanations of common collection algorithms and concurrency utilities.
- References for further reading on collections and concurrency.
Agile Data Science 2.0 covers the theory and practice of applying agile methods to the practice of applied analytics research called data science. The book takes the stance that data products are the preferred output format for data science teams to effect change in an organization. Accordingly, we show how to "get meta" to enable agility in building applications describing the applied research process itself. Then we show how to use 'big data' tools to iteratively build, deploy and refine analytics applications. Tracking data-product development through the five stages of the "data value pyramid", we show you how to build applications from conception through development through deployment and then through iterative improvement. Application development is a fundamental skill for a data scientist, and by publishing your data science work as a web application, we show you how to effect maximal change within your organization.
Technologies covered include Python, Apache Spark (Spark MLlib, Spark Streaming), Apache Kafka, MongoDB, ElasticSearch and Apache Airflow.
This document summarizes a presentation given by Diane Mueller from ActiveState and Dr. Mike Müller from Python Academy. It compares MATLAB and Python capabilities for scientific computing. Python has many libraries like NumPy, SciPy, IPython and matplotlib that provide similar functionality to MATLAB. Together these are often called "Pylab". The presentation provides an overview of Python, NumPy arrays, visualization with matplotlib, and integrating Python with other languages.
The document describes a dataset containing on-time performance records for 95% of commercial flights in the United States. It includes over 30 fields of information for each flight such as airline, departure/arrival times, delays, distances, and causes of delays. An example record from the dataset is shown containing values for many of the fields.
PHASE (Philly Area Scala Enthusiasts) - Word2vec in Scala. Talk explains concrete examples of how Word2vec works, built around a demo of constructing email alerts using concept search.
Spark schema for free with David SzakallasDatabricks
DataFrames are essential for high-performance code, but sadly lag behind in development experience in Scala. When we started migrating our existing Spark application from RDDs to DataFrames at Whitepages, we had to scratch our heads real hard to come up with a good solution. DataFrames come at a loss of compile-time type safety and there is limited support for encoding JVM types.
We wanted more descriptive types without the overhead of Dataset operations. The data binding API should be extendable. Schema for input files should be generated from classes when we don’t want inference. UDFs should be more type-safe. Spark does not provide these natively, but with the help of shapeless and type-level programming we found a solution to nearly all of our wishes. We migrated the RDD code without any of the following: changing our domain entities, writing schema description or breaking binary compatibility with our existing formats. Instead we derived schema, data binding and UDFs, and tried to sacrifice the least amount of type safety while still enjoying the performance of DataFrames.
An Introduction to Higher Order Functions in Spark SQL with Herman van HovellDatabricks
Nested data types offer Apache Spark users powerful ways to manipulate structured data. In particular, they allow you to put complex objects like arrays, maps and structures inside of columns. This can help you model your data in a more natural way.
While this feature is certainly useful, it can quite bit cumbersome to manipulate data inside of complex objects because SQL (and Spark) do not have primitives for working with such data. In addition, it is time-consuming, non-performant, and non-trivial. During this talk we will discuss some of the commonly used techniques for working with complex objects, and we will introduce new ones based on Higher-order functions. Higher-order functions will be part of Spark 2.4 and are a simple and performant extension to SQL that allow a user to manipulate complex data such as arrays.
Running Intelligent Applications inside a Database: Deep Learning with Python...Miguel González-Fierro
In this talk we present a new paradigm of computation where the intelligence is computed inside the database. Standard software systems must get the data from the database to execute a routine. If the size of the data is big, there are inefficiencies due to the data movement. Store procedures tried to solve this issue in the past, allowing for computing simple functions inside the database. However, only simple routines can be executed.
To showcase the capabilities of our new system, we created a lung cancer detection algorithm using Microsoft’s Cognitive Toolkit, also known as CNTK. We used transfer learning between ImageNet dataset, which contains natural images, and a lung cancer dataset, which contains scans of horizontal sections of the lung for healthy and sick patients. Specifically, a pretrained Convolutional Neural Network on ImageNet is used on the lung cancer dataset to generate features. Once the features are computed, a boosted tree is applied to predict whether the patient has cancer or not.
All this process is computed inside the database, so the data movement is minimized. We are even able to execute the algorithm using the GPU of the virtual machine that hosts the database. Using a GPU, we can compute the featurization in less than 1h, in contrast to using a CPU, that would take up to 32h. Finally, we set up an API to connect the solution to a web app, where a doctor can analyze the images and get a prediction of a patient.
A lecture given for Stats 285 at Stanford on October 30, 2017. I discuss how OSS technology developed at Anaconda, Inc. has helped to scale Python to GPUs and Clusters.
Standardizing arrays -- Microsoft PresentationTravis Oliphant
This document discusses standardizing N-dimensional arrays (tensors) in Python. It proposes creating a "uarray" interface that downstream libraries could use to work with different array implementations in a common way. This would include defining core concepts like shape, data type, and math operations for arrays. It also discusses collaborating with mathematicians on formalizing array operations and learning from NumPy's generalized ufunc approach. The goal is to enhance Python's array ecosystem and allow libraries to work across hardware backends through a shared interface rather than depending on a single implementation.
This is part of an introductory course on Big Data Tools for Artificial Intelligence. These slides introduce students to the new in-memory cluster computing named Spark.
This document provides an overview of data science and machine learning with Anaconda. It begins with an introduction to Travis Oliphant, the founder of Continuum Analytics. It then discusses how Continuum created two organizations, NumFOCUS and Continuum Analytics, to support open source scientific computing and provide enterprise software and services. The rest of the document outlines how data science and machine learning are growing rapidly with Python and describes some of Anaconda's key capabilities for data science workflows and empowering data science teams.
Reproducible, Open Data Science in the Life SciencesEamonn Maguire
The document outlines the workflow of a data scientist, from planning experiments and collecting data, to analyzing, visualizing, and publishing results. It emphasizes that data science involves formalizing hypotheses based on observations and testing them using collected data. A suite of open-source tools is presented to help data scientists in managing data and supporting open, reproducible life science research. The goal is to enable integration and sharing of experimental data and results.
This document discusses building full stack data analytics applications using Apache Kafka and Apache Spark. It provides an overview of agile data science principles and methodologies. It also outlines various tools that can be used in the data pipeline and stack, such as Apache Spark, Apache Kafka, MongoDB, Elasticsearch, and d3.js. It discusses considerations for data structure and access patterns, as well as climbing the data value pyramid from raw data to higher order insights.
EuroPython 2015 - Big Data with Python and HadoopMax Tepkeev
Big Data - these two words are heard so often nowadays. But what exactly is Big Data ? Can we, Pythonistas, enter the wonder world of Big Data ? The answer is definitely “Yes”.
This talk is an introduction to the big data processing using Apache Hadoop and Python. We’ll talk about Apache Hadoop, it’s concepts, infrastructure and how one can use Python with it. We’ll compare the speed of Python jobs under different Python implementations, including CPython, PyPy and Jython and also discuss what Python libraries are available out there to work with Apache Hadoop.
The primary focus of this presentation is approaching the migration of a large, legacy data store into a new schema built with Django. Includes discussion of how to structure a migration script so that it will run efficiently and scale. Learn how to recognize and evaluate trouble spots.
Also discusses some general tips and tricks for working with data and establishing a productive workflow.
The document discusses strategies for migrating large amounts of legacy data from an old database into a new Django application. Some key points:
- Migrating data in batches and minimizing database queries per row processed can improve performance for large datasets.
- Tools like SQLAlchemy and Maatkit can help optimize the migration process.
- It's important to profile queries, enable logging/debugging, and design migrations that can resume/restart after failures or pause for maintenance.
- Preserving some legacy metadata like IDs on the new models allows mapping data between the systems. Declarative and modular code helps scale the migration tasks.
Graph Databases in the Microsoft EcosystemMarco Parenzan
With SQL Server and Cosmos Db we now have graph databases broadly available, after being studied for decades in Db theory, or being a niche approach in Open Source with Neo4J. And then there are services like Microsoft Graph and Azure Digital Twins that give us vertical implementations of graph. So let's make a walkaround of graphs in the MIcrosoft ecosystem.
Konstantin will tell us about challenges his team faced during this app development, about decisions on frameworks, libraries, patterns, analytics. It's always interesting to know how mobile development for different mobile platforms goes in large corporations like Microsoft.
Connect me: https://www.linkedin.com/profile/view?id=60116085
Data structures cs301 power point slides lecture 01shaziabibi5
This lecture covers data structures and their implementation in C++. It discusses how data structures organize data to make programs more efficient. Common data structures that will be covered include dynamic arrays, linked lists, stacks, queues, trees and graphs. The lecture emphasizes that each data structure has costs and benefits depending on the problem, and the goal is to select the most appropriate structure. It also introduces arrays as a basic built-in data structure in many languages and how dynamic arrays can be used when the size is unknown at compile time.
The document describes Cocovila, a visual domain-specific language development environment that allows for the declarative modeling and simulation of systems through visual specification and automatic program synthesis. It discusses how Cocovila allows users to visually specify models using predefined components with input and output ports, automatically generating editor and simulation code. An example demonstration of using Cocovila to model a predator-prey system and an electro-hydraulic servo valve is also provided.
Agile Data Science 2.0 covers the theory and practice of applying agile methods to the practice of applied analytics research called data science. The book takes the stance that data products are the preferred output format for data science teams to effect change in an organization. Accordingly, we show how to "get meta" to enable agility in building applications describing the applied research process itself. Then we show how to use 'big data' tools to iteratively build, deploy and refine analytics applications. Tracking data-product development through the five stages of the "data value pyramid", we show you how to build applications from conception through development through deployment and then through iterative improvement. Application development is a fundamental skill for a data scientist, and by publishing your data science work as a web application, we show you how to effect maximal change within your organization.
Technologies covered include Python, Apache Spark (Spark MLlib, Spark Streaming), Apache Kafka, MongoDB, ElasticSearch and Apache Airflow.
This document summarizes a presentation given by Diane Mueller from ActiveState and Dr. Mike Müller from Python Academy. It compares MATLAB and Python capabilities for scientific computing. Python has many libraries like NumPy, SciPy, IPython and matplotlib that provide similar functionality to MATLAB. Together these are often called "Pylab". The presentation provides an overview of Python, NumPy arrays, visualization with matplotlib, and integrating Python with other languages.
The document describes a dataset containing on-time performance records for 95% of commercial flights in the United States. It includes over 30 fields of information for each flight such as airline, departure/arrival times, delays, distances, and causes of delays. An example record from the dataset is shown containing values for many of the fields.
PHASE (Philly Area Scala Enthusiasts) - Word2vec in Scala. Talk explains concrete examples of how Word2vec works, built around a demo of constructing email alerts using concept search.
Spark schema for free with David SzakallasDatabricks
DataFrames are essential for high-performance code, but sadly lag behind in development experience in Scala. When we started migrating our existing Spark application from RDDs to DataFrames at Whitepages, we had to scratch our heads real hard to come up with a good solution. DataFrames come at a loss of compile-time type safety and there is limited support for encoding JVM types.
We wanted more descriptive types without the overhead of Dataset operations. The data binding API should be extendable. Schema for input files should be generated from classes when we don’t want inference. UDFs should be more type-safe. Spark does not provide these natively, but with the help of shapeless and type-level programming we found a solution to nearly all of our wishes. We migrated the RDD code without any of the following: changing our domain entities, writing schema description or breaking binary compatibility with our existing formats. Instead we derived schema, data binding and UDFs, and tried to sacrifice the least amount of type safety while still enjoying the performance of DataFrames.
An Introduction to Higher Order Functions in Spark SQL with Herman van HovellDatabricks
Nested data types offer Apache Spark users powerful ways to manipulate structured data. In particular, they allow you to put complex objects like arrays, maps and structures inside of columns. This can help you model your data in a more natural way.
While this feature is certainly useful, it can quite bit cumbersome to manipulate data inside of complex objects because SQL (and Spark) do not have primitives for working with such data. In addition, it is time-consuming, non-performant, and non-trivial. During this talk we will discuss some of the commonly used techniques for working with complex objects, and we will introduce new ones based on Higher-order functions. Higher-order functions will be part of Spark 2.4 and are a simple and performant extension to SQL that allow a user to manipulate complex data such as arrays.
Running Intelligent Applications inside a Database: Deep Learning with Python...Miguel González-Fierro
In this talk we present a new paradigm of computation where the intelligence is computed inside the database. Standard software systems must get the data from the database to execute a routine. If the size of the data is big, there are inefficiencies due to the data movement. Store procedures tried to solve this issue in the past, allowing for computing simple functions inside the database. However, only simple routines can be executed.
To showcase the capabilities of our new system, we created a lung cancer detection algorithm using Microsoft’s Cognitive Toolkit, also known as CNTK. We used transfer learning between ImageNet dataset, which contains natural images, and a lung cancer dataset, which contains scans of horizontal sections of the lung for healthy and sick patients. Specifically, a pretrained Convolutional Neural Network on ImageNet is used on the lung cancer dataset to generate features. Once the features are computed, a boosted tree is applied to predict whether the patient has cancer or not.
All this process is computed inside the database, so the data movement is minimized. We are even able to execute the algorithm using the GPU of the virtual machine that hosts the database. Using a GPU, we can compute the featurization in less than 1h, in contrast to using a CPU, that would take up to 32h. Finally, we set up an API to connect the solution to a web app, where a doctor can analyze the images and get a prediction of a patient.
A lecture given for Stats 285 at Stanford on October 30, 2017. I discuss how OSS technology developed at Anaconda, Inc. has helped to scale Python to GPUs and Clusters.
Standardizing arrays -- Microsoft PresentationTravis Oliphant
This document discusses standardizing N-dimensional arrays (tensors) in Python. It proposes creating a "uarray" interface that downstream libraries could use to work with different array implementations in a common way. This would include defining core concepts like shape, data type, and math operations for arrays. It also discusses collaborating with mathematicians on formalizing array operations and learning from NumPy's generalized ufunc approach. The goal is to enhance Python's array ecosystem and allow libraries to work across hardware backends through a shared interface rather than depending on a single implementation.
This is part of an introductory course on Big Data Tools for Artificial Intelligence. These slides introduce students to the new in-memory cluster computing named Spark.
This document provides an overview of data science and machine learning with Anaconda. It begins with an introduction to Travis Oliphant, the founder of Continuum Analytics. It then discusses how Continuum created two organizations, NumFOCUS and Continuum Analytics, to support open source scientific computing and provide enterprise software and services. The rest of the document outlines how data science and machine learning are growing rapidly with Python and describes some of Anaconda's key capabilities for data science workflows and empowering data science teams.
Reproducible, Open Data Science in the Life SciencesEamonn Maguire
The document outlines the workflow of a data scientist, from planning experiments and collecting data, to analyzing, visualizing, and publishing results. It emphasizes that data science involves formalizing hypotheses based on observations and testing them using collected data. A suite of open-source tools is presented to help data scientists in managing data and supporting open, reproducible life science research. The goal is to enable integration and sharing of experimental data and results.
This document discusses building full stack data analytics applications using Apache Kafka and Apache Spark. It provides an overview of agile data science principles and methodologies. It also outlines various tools that can be used in the data pipeline and stack, such as Apache Spark, Apache Kafka, MongoDB, Elasticsearch, and d3.js. It discusses considerations for data structure and access patterns, as well as climbing the data value pyramid from raw data to higher order insights.
EuroPython 2015 - Big Data with Python and HadoopMax Tepkeev
Big Data - these two words are heard so often nowadays. But what exactly is Big Data ? Can we, Pythonistas, enter the wonder world of Big Data ? The answer is definitely “Yes”.
This talk is an introduction to the big data processing using Apache Hadoop and Python. We’ll talk about Apache Hadoop, it’s concepts, infrastructure and how one can use Python with it. We’ll compare the speed of Python jobs under different Python implementations, including CPython, PyPy and Jython and also discuss what Python libraries are available out there to work with Apache Hadoop.
The primary focus of this presentation is approaching the migration of a large, legacy data store into a new schema built with Django. Includes discussion of how to structure a migration script so that it will run efficiently and scale. Learn how to recognize and evaluate trouble spots.
Also discusses some general tips and tricks for working with data and establishing a productive workflow.
The document discusses strategies for migrating large amounts of legacy data from an old database into a new Django application. Some key points:
- Migrating data in batches and minimizing database queries per row processed can improve performance for large datasets.
- Tools like SQLAlchemy and Maatkit can help optimize the migration process.
- It's important to profile queries, enable logging/debugging, and design migrations that can resume/restart after failures or pause for maintenance.
- Preserving some legacy metadata like IDs on the new models allows mapping data between the systems. Declarative and modular code helps scale the migration tasks.
Graph Databases in the Microsoft EcosystemMarco Parenzan
With SQL Server and Cosmos Db we now have graph databases broadly available, after being studied for decades in Db theory, or being a niche approach in Open Source with Neo4J. And then there are services like Microsoft Graph and Azure Digital Twins that give us vertical implementations of graph. So let's make a walkaround of graphs in the MIcrosoft ecosystem.
Konstantin will tell us about challenges his team faced during this app development, about decisions on frameworks, libraries, patterns, analytics. It's always interesting to know how mobile development for different mobile platforms goes in large corporations like Microsoft.
Connect me: https://www.linkedin.com/profile/view?id=60116085
Data structures cs301 power point slides lecture 01shaziabibi5
This lecture covers data structures and their implementation in C++. It discusses how data structures organize data to make programs more efficient. Common data structures that will be covered include dynamic arrays, linked lists, stacks, queues, trees and graphs. The lecture emphasizes that each data structure has costs and benefits depending on the problem, and the goal is to select the most appropriate structure. It also introduces arrays as a basic built-in data structure in many languages and how dynamic arrays can be used when the size is unknown at compile time.
The document describes Cocovila, a visual domain-specific language development environment that allows for the declarative modeling and simulation of systems through visual specification and automatic program synthesis. It discusses how Cocovila allows users to visually specify models using predefined components with input and output ports, automatically generating editor and simulation code. An example demonstration of using Cocovila to model a predator-prey system and an electro-hydraulic servo valve is also provided.
The document discusses an iOS application called D8iOS that allows iOS developers to access content from a Drupal 8 backend. It provides an overview of D8iOS, demonstrates how it uses an SDK and networking library to interface with Drupal's RESTful API, and discusses some benefits and limitations of using Drupal as a backend for mobile apps. The presenter then demonstrates D8iOS with a live demo.
This document introduces ClojureScript and building applications with it. It discusses how ClojureScript compiles Clojure to JavaScript and can run anywhere JavaScript runs. It covers the basics of the ClojureScript language like syntax, data structures, and functions. It also discusses tools for ClojureScript development like Leiningen, Figwheel, Shadow CLJS, and Cursive. Additionally, it covers building web applications with ClojureScript using templates like Hiccup and libraries like Reagent and Reframe.
[HES2013] Nifty stuff that you can still do with android by Xavier MartinHackito Ergo Sum
Fact: It is generally assumed that reverse engineering of Android applications is much easier than on other architectures. Static program analysis is the way to go.You can go back and forth between application and bytecode assembly without much hassle.
Reality: Few techniques are willing to make their comeback on this platform, namely dynamically code loading and self modifying code : bringing the fun back ! Source code examples will be shown, with step by step explanation.
https://www.hackitoergosum.org
Groovy is a dynamic language for the Java Virtual Machine that simplifies programming through features like closures, properties, and built-in support for lists, maps, ranges, and regular expressions. The latest version 1.5 adds support for Java 5 features like annotations and generics to leverage frameworks that use them. Groovy can be integrated into applications through mechanisms like JSR-223, Spring, and Groovy's own GroovyClassLoader to externalize business rules, provide extension points, and customize applications.
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioAlluxio, Inc.
Alluxio Global Online Meetup
Apr 23, 2020
For more Alluxio events: https://www.alluxio.io/events/
Speakers:
Jiao (Jennie) Wang, Intel
Tsai Louie, Intel
Bin Fan, Alluxio
Today, many people run deep learning applications with training data from separate storage such as object storage or remote data centers. This presentation will demo the Intel Analytics Zoo + Alluxio stack, an architecture that enables high performance while keeping cost and resource efficiency balanced without network being I/O bottlenecked.
Intel Analytics Zoo is a unified data analytics and AI platform open-sourced by Intel. It seamlessly unites TensorFlow, Keras, PyTorch, Spark, Flink, and Ray programs into an integrated pipeline, which can transparently scale from a laptop to large clusters to process production big data. Alluxio, as an open-source data orchestration layer, accelerates data loading and processing in Analytics Zoo deep learning applications.
This talk, we will go over:
- What is Analytics Zoo and how it works
- How to run Analytics Zoo with Alluxio in deep learning applications
- Initial performance benchmark results using the Analytics Zoo + Alluxio stack
- Scala is being adopted for web platforms, trading platforms, financial modeling, and simulation. Scala 2.9 includes improvements to parallel and concurrent computing libraries as well as faster compilation.
- Play Framework 2.0 will move to a Scala core while retaining Java support. The Scala Eclipse IDE has been reworked for better reliability and responsiveness.
- Scala 2.10 will include a new reflection framework and other IDE improvements. Avoiding mutable state enables functional and parallel programming. Scala supports both parallelism and concurrency through tools like parallel collections and actors.
- Future work includes distributed collections, parallel domain-specific languages, and unifying the Scala compiler and reflection APIs. Scal
apidays LIVE Australia 2021 - Tracing across your distributed process boundar...apidays
apidays LIVE Australia 2021 - Accelerating Digital
September 15 & 16, 2021
Tracing across your distributed process boundaries using OpenTelemetry
Dasith Wijes, Senior Consultant at Microsoft (Azure Cloud & AI Team)
Java is an object-oriented programming language created by James Gosling at Sun Microsystems in 1995. It is platform independent, meaning programs written in Java can run on any system that supports Java without needing to be recompiled. The document provides an overview of Java, including its history and development, basic concepts like classes and objects, and how to write simple Java programs. It also discusses Java's advantages like being simple, object-oriented, portable, multithreaded, and secure.
Building modular software with OSGi - Ulf Fildebrandtmfrancis
The document discusses how to build modular software using OSGi by defining modules based on difficult design decisions, implementing principles like dependency injection and the Liskov substitution principle, and measuring modularity through metrics analyzed by tools like ConQAT to compare the desired architecture with the actual implementation. The goals of modularity are to manage complexity through separation of concerns into interchangeable modules and to allow systems to evolve over time through substitutability and extensibility of modules.
How Google AppEngine deals with digital art? how about music? a few case studies developed by Stinkdigital with Google Creative Lab and how App Engine dealt with a considerable amount of visits
I just made a change to the database schema, but now the team needs it for my feature to work. How can I keep track of my database changes and communicate them to the rest of the team? Migrations give a structured way to structurally alter your database structure as your application evolves . . . structurally. They also provide a way for everyone on the team: developers, testers, CI admins, DBAs, etc, to apply the latest changes wherever they are needed - with uniformity and low friction. Fluent Migrations for .NET provide a discoverable, human readable API that supports dozens of different databases (including SQL Server, PostgreSQL, Oracle). Topics covered in this session:
* Why you should use migrations
* How to write fluent migrations
* A look behind the scenes of how fluent migrations work
* Drawbacks/downsides to using migrations
* Other migration options for EF and NoSQL (Couchbase)
Ad hoc SQL scripts make you want to flip a desk? Keep your team on the same page with fluent migrations.
(This session will briefly mention EF Migrations, but is not primarily about EF).
Jump Start into Apache® Spark™ and DatabricksDatabricks
These are the slides from the Jump Start into Apache Spark and Databricks webinar on February 10th, 2016.
---
Spark is a fast, easy to use, and unified engine that allows you to solve many Data Sciences and Big Data (and many not-so-Big Data) scenarios easily. Spark comes packaged with higher-level libraries, including support for SQL queries, streaming data, machine learning, and graph processing. We will leverage Databricks to quickly and easily demonstrate, visualize, and debug our code samples; the notebooks will be available for you to download.
The Science Working Group is an international collaboration of scientific organizations that develops open-source software tools for scientific research. It has 15 member organizations from fields like neutron sources and synchrotrons. The group created reusable software like the General Data Analysis framework and DAWN data analysis workbench. Recent projects included adopting new technologies like OSGi and developing SWMR file support and extensions to DAWN like a Fano factor image filter.
- DawnScience is an open source not-for-profit project on GitHub developed by Diamond Light Source Ltd. and the ESRF, which are largely publicly funded research facilities
- The project provides software for controlling experiments, analyzing and visualizing multi-dimensional scientific data, and running analysis pipelines during experiments
- It utilizes several Eclipse and Apache technologies and dependencies including Eclipse RCP, Buckminster, Git/eGit, GEF, Draw2D, and HDF5
Eclipse Con Europe 2014 How to use DAWN Science ProjectMatthew Gerring
This document summarizes the DawnScience Eclipse project, which is an open source not-for-profit project on GitHub. It aims to provide APIs and reference implementations for loading, describing, slicing, transforming, and plotting multidimensional scientific data. Phase 1 from 2014-2015 defined long-term APIs and a reference implementation for HDF5 loading, data description, plotting, and slicing interfaces. Phase 2 in 2016 will release concrete implementations. The project utilizes Eclipse technologies and collaborates with scientific facilities.
Eclipse science group presentation given at Eclipse Converge and Devoxx 2017 in California. These slides give an overview of projects in the Eclipse Science working group in 2017.
Immersive Learning That Works: Research Grounding and Paths ForwardLeonel Morgado
We will metaverse into the essence of immersive learning, into its three dimensions and conceptual models. This approach encompasses elements from teaching methodologies to social involvement, through organizational concerns and technologies. Challenging the perception of learning as knowledge transfer, we introduce a 'Uses, Practices & Strategies' model operationalized by the 'Immersive Learning Brain' and ‘Immersion Cube’ frameworks. This approach offers a comprehensive guide through the intricacies of immersive educational experiences and spotlighting research frontiers, along the immersion dimensions of system, narrative, and agency. Our discourse extends to stakeholders beyond the academic sphere, addressing the interests of technologists, instructional designers, and policymakers. We span various contexts, from formal education to organizational transformation to the new horizon of an AI-pervasive society. This keynote aims to unite the iLRN community in a collaborative journey towards a future where immersive learning research and practice coalesce, paving the way for innovative educational research and practice landscapes.
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...Scintica Instrumentation
Targeting Hsp90 and its pathogen Orthologs with Tethered Inhibitors as a Diagnostic and Therapeutic Strategy for cancer and infectious diseases with Dr. Timothy Haystead.
When I was asked to give a companion lecture in support of ‘The Philosophy of Science’ (https://shorturl.at/4pUXz) I decided not to walk through the detail of the many methodologies in order of use. Instead, I chose to employ a long standing, and ongoing, scientific development as an exemplar. And so, I chose the ever evolving story of Thermodynamics as a scientific investigation at its best.
Conducted over a period of >200 years, Thermodynamics R&D, and application, benefitted from the highest levels of professionalism, collaboration, and technical thoroughness. New layers of application, methodology, and practice were made possible by the progressive advance of technology. In turn, this has seen measurement and modelling accuracy continually improved at a micro and macro level.
Perhaps most importantly, Thermodynamics rapidly became a primary tool in the advance of applied science/engineering/technology, spanning micro-tech, to aerospace and cosmology. I can think of no better a story to illustrate the breadth of scientific methodologies and applications at their best.
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...Advanced-Concepts-Team
Presentation in the Science Coffee of the Advanced Concepts Team of the European Space Agency on the 07.06.2024.
Speaker: Diego Blas (IFAE/ICREA)
Title: Gravitational wave detection with orbital motion of Moon and artificial
Abstract:
In this talk I will describe some recent ideas to find gravitational waves from supermassive black holes or of primordial origin by studying their secular effect on the orbital motion of the Moon or satellites that are laser ranged.
Current Ms word generated power point presentation covers major details about the micronuclei test. It's significance and assays to conduct it. It is used to detect the micronuclei formation inside the cells of nearly every multicellular organism. It's formation takes place during chromosomal sepration at metaphase.
The cost of acquiring information by natural selectionCarl Bergstrom
This is a short talk that I gave at the Banff International Research Station workshop on Modeling and Theory in Population Biology. The idea is to try to understand how the burden of natural selection relates to the amount of information that selection puts into the genome.
It's based on the first part of this research paper:
The cost of information acquisition by natural selection
Ryan Seamus McGee, Olivia Kosterlitz, Artem Kaznatcheev, Benjamin Kerr, Carl T. Bergstrom
bioRxiv 2022.07.02.498577; doi: https://doi.org/10.1101/2022.07.02.498577
The technology uses reclaimed CO₂ as the dyeing medium in a closed loop process. When pressurized, CO₂ becomes supercritical (SC-CO₂). In this state CO₂ has a very high solvent power, allowing the dye to dissolve easily.
The binding of cosmological structures by massless topological defectsSérgio Sacani
Assuming spherical symmetry and weak field, it is shown that if one solves the Poisson equation or the Einstein field
equations sourced by a topological defect, i.e. a singularity of a very specific form, the result is a localized gravitational
field capable of driving flat rotation (i.e. Keplerian circular orbits at a constant speed for all radii) of test masses on a thin
spherical shell without any underlying mass. Moreover, a large-scale structure which exploits this solution by assembling
concentrically a number of such topological defects can establish a flat stellar or galactic rotation curve, and can also deflect
light in the same manner as an equipotential (isothermal) sphere. Thus, the need for dark matter or modified gravity theory is
mitigated, at least in part.
The debris of the ‘last major merger’ is dynamically youngSérgio Sacani
The Milky Way’s (MW) inner stellar halo contains an [Fe/H]-rich component with highly eccentric orbits, often referred to as the
‘last major merger.’ Hypotheses for the origin of this component include Gaia-Sausage/Enceladus (GSE), where the progenitor
collided with the MW proto-disc 8–11 Gyr ago, and the Virgo Radial Merger (VRM), where the progenitor collided with the
MW disc within the last 3 Gyr. These two scenarios make different predictions about observable structure in local phase space,
because the morphology of debris depends on how long it has had to phase mix. The recently identified phase-space folds in Gaia
DR3 have positive caustic velocities, making them fundamentally different than the phase-mixed chevrons found in simulations
at late times. Roughly 20 per cent of the stars in the prograde local stellar halo are associated with the observed caustics. Based
on a simple phase-mixing model, the observed number of caustics are consistent with a merger that occurred 1–2 Gyr ago.
We also compare the observed phase-space distribution to FIRE-2 Latte simulations of GSE-like mergers, using a quantitative
measurement of phase mixing (2D causticality). The observed local phase-space distribution best matches the simulated data
1–2 Gyr after collision, and certainly not later than 3 Gyr. This is further evidence that the progenitor of the ‘last major merger’
did not collide with the MW proto-disc at early times, as is thought for the GSE, but instead collided with the MW disc within
the last few Gyr, consistent with the body of work surrounding the VRM.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
3. Eclipse Advanced Visualization Project
projects.eclipse.org/projects/science.eavp
Visualization is a critical part of
science and engineering projects
and has roles in both setting up
problems and post-processing
results. The input or "construction"
side can include things like
constructing 3D geometries or
volume meshes of physical space
and the post-processing side can
include everything from visualizing
those geometries and meshes to
plotting results to analyzing
images to visualizing real data to
almost everything else
imagineable. There are numerous
technologies for performing these
tasks and most of them, with the
exception of SWT-XY-GRAPH, are
unavailable natively in the Eclipse
ecosystem.
Active Member Companies:
4. Eclipse ChemClipse
projects.eclipse.org/projects/science.chemclipse
Active Member Companies:
Eclipse ChemClipse supports the user to analyse
data acquired from systems used in analytical
chemistry. In particular, chromatography coupled
with mass spectrometry (GC/MS) or flame-
ionization detectors (GC/FID).
• Converter (import and/or export of raw data
sets)
• Classifier (non-destructive methods to extract
characteristic values)
• Filter (destructive methods to optimize the
data sets)
• Peak detection (finding peaks – each peak is
a chemical substance)
• Chromatogram/Peak integration (calculation
of the chromatogram/peak area)
• Identification (identification of each peak
mass spectrum)
• Quantitation (use the data for calibration
issues)
• Reporting (report the results for further
analytical steps)
• Processing (automation of the data handling)
5. Eclipse DAWNSci
projects.eclipse.org/projects/science.dawnsci
“Visualization, Data Slicing, Tools and Python” www.dawnsci.org
Active Member Companies:
DAWNSci is the public API to
DAWN (dawnsci.org). It allows
people customizing DAWN to
interact with a long term
programming interface providing
them with a way to ensure that
their extensions to DAWN work in
the future.
DAWN (DAWNSCI.ORG) Does
• Visualization of Data
• 100’s of formats supported
• Integration of plotting + Python
• Fully Scriptable Approach
• Many Scientific Perspectives
• Online Analysis Tools used
with Data Acquisition
• Integration with Experimental
Definition Database (ISPyB)
6. Eclipse January
projects.eclipse.org/projects/science.january
“Numpy Data Manipulation for Java”
Active Member Companies:
Eclipse January is a set of libraries for handling
numerical data in Java. It is inspired in part by NumPy
and aims to provide similar functionality.
Why use it?
Familiar. Provide familiar functionality, especially to
NumPy users.
Robust. Has test suite and is used in production heavily
at Diamond Light Source.
No more passing double[]. IDataset provide a consistent
object for basing APIs on with significantly improved
clarity over using double arrays or similar.
Optimized. Optimized for speed and getting better all
the time.
Scalable. Allows handling of data sets larger than
available memory with "Lazy Datasets".
Focus on your algorithms. By reusing this library it
allows you to focus on your code.
Taken From: NumpyExamples.java https://github.com/eclipse/january
import org.eclipse.january.dataset.IDataset;
IDataset a = DatasetFactory.createFromObject(new double[]{1,2,3,6,4,5,8,9,7}, 3, 3);
System.out.println("a has rank "+a.getRank());
System.out.println("a has size "+a.getSize());
double val = a.getDouble(-1,-1); // Last element
val = a.getDouble(1,4);
a = Random.rand(new int[]{10, 10}); // Random data different shape
IDataset set = a.getSliceView(new Slice(1,2)); // Entire Row
set = a.getSliceView(new Slice(5)); // First five rows
set = a.getSliceView(new Slice(-5,null)); // Last five rows
set = a.getSliceView(new Slice(0,3), new Slice(4,9)); // Subslice
a = Random.rand(new int[]{21, 10}); // Random data different shape
set = a.getSliceView(new Slice(null,null,2), null); // Every other two
set = a.getSliceView(new Slice(null,null,-1), null); // Reverse order
IDataset first = a.getSlice(new Slice(0,1), null);
IDataset aplus = DatasetUtils.append(a, first, 0); // Append something
IDataset b = DatasetFactory.createFromObject(new
double[]{1.1,2.2,3.3,4.4,5.5,6.6,7.7,8.8,9.9}, 3, 3);
IDataset d = Maths.multiply(a, b); // a.b
IDataset[] e = LinearAlgebra.calcEigenDecomposition(a); // Eigen
a = Random.rand(new int[]{100, 100}); // Random data different shape
IDataset s = DatasetUtils.sort(a, 0); // Sort row-wise
7. Eclipse Rich Beans
projects.eclipse.org/projects/science.richbeans
“UI Binding for Massive Bean Trees and Undefined Sizes”
Active Member Companies:
Eclipse Rich Beans is a widget set for Science User
Interfaces which automatically binds to data (beans)
using reflection. The binding layer is more flexible than
other technologies in the Eclipse Eco-system because it
scales to large complex bean trees. For instance it
provides widgets for editing lists of beans where a bean
in the list contains thousands of data points of
information.
• List Support
• Unlimited Nesting Supported
• Wide range of data entry widgets based on SWT
• Scalable to large data trees
• Simple to use OSGi service
• Easy to reuse in any project
• Low dependency design
UI BEAN
public class ExampleBean {
private List<ExampleItem> items;
//…
public class ExampleItem {
private String itemName;
private ItemChoice choice = ItemChoice.XY;
private Double x,y;
private double r,theta;
public enum ItemChoice {
XY, POLAR;
public static Map<String, ItemChoice> names() {
final Map<String,ItemChoice> ret = new HashMap<String,ItemChoice>(2);
ret.put("X-Y Graph", XY);
ret.put("Polar", POLAR);
return ret;
}
}
//…
private List<ExampleItem> items;
//… Example has 2000 items
public class ExampleItem {
private String itemName;
private ItemChoice choice = ItemChoice.XY;
private Double x,y;
private double r,theta;
private double d0, d1,d2,d3,d4,d5,d6,d7,d8, d9;
private double d10, d11,d12,d13,d14,d15,d16,d17,d18, d19;
private double d20, d21,d22,d23,d24,d25,d26,d27,d28, d29;
private double d30, d31,d32,d33,d34,d35,d36,d37,d38, d39;
private double d40, d41,d42,d43,d44,d45,d46,d47,d48, d49;
private double d50, d51,d52,d53,d54,d55,d56,d57,d58, d59;
private double d60, d61,d62,d63,d64,d65,d66,d67,d68, d69;
private double d70, d71,d72,d73,d74,d75,d76,d77,d78, d79;
private double d80, d81,d82,d83,d84,d85,d86,d87,d88, d89;
private double d90, d91,d92,d93,d94,d95,d96,d97,d98, d99;
//… Example has 100 fields
8. Eclipse Scanning
projects.eclipse.org/projects/science.scanning
“Making Moving Scientific Instruments and Writing Data EASY”
Active Member Companies:
Eclipse Scanning is a project for scanning
scientific instruments and writing data to
HDF5 files. It is designed to be integrated
with common control systems such as
EPICS and TANGO but it makes no
assumptions about how individual devices
are moved.
• Wide Range of Scanning Paths
• OSGi Design
• True nD unlimited degrees of freedom
• Fast and Multi-threaded
• Out performs custom Python
acquisition scripts
• Eclipse January Supported
• Java 8+
• HDF5 NeXus Compliant
• Easy to reuse in any project
• Low dependency design
9. Eclipse StatET
projects.eclipse.org/projects/science.statet
“Tooling for the R language”
Active Member Companies:
Eclipse StatET is an Eclipse-based IDE for
R. It offers a set of mature tools for R
coding and package building. This
includes:
• a fully integrated R console
• R script editors
• an integrated R Graphics view
• an object browser to explore the
objects R has in memory
• an integrated R Help system
• functionality to interact with multiple
local and remote installations of R
• a visual debugger for R
• editors and document processing
support for Sweave/knitr and
Rmarkdown documents
• support for running R CMD tools as
External launch configurations
Open Analytics
10. Eclipse TeXlipse
projects.eclipse.org/projects/science. texlipse
“Because LaTeX + Eclipse = Good”
Active Member Companies:
LaTeX is a typesetting system that is widely used by the science
community for document preparation and publications. The TeXlipse
project provides an Eclipse extension to support LaTex projects, so
that document preparation can be incorporated into the normal Eclipse
development activities. General LaTeX users will also find the
advanced editing and automatic document generation features of
TeXclipse provide a compelling alternative to other LaTeX
environments. TeXclipse supports the following features:
• Syntax/semantic editing of LaTeX documents
• Code folding
• Error annotations
• Content assist
• Line wrapping
• Table editor
• BibTeX editing support
• F3 navigation
• File and document outline
• Templates
• Build support (document typesetting)
• Spell checking
• Menu support for common LaTeX symbols
• BibTeX and BibLaTeX support
• Integration of PDF viewers
• Bibsonomy integration
11. Eclipse Triquetrum
projects.eclipse.org/projects/science. triquetrum
“Ptolemy-based Algorithms for Eclipse”
Active Member Companies:
Eclipse Triquetrum delivers an open platform for managing and
executing scientific workflows. The goal of Triquetrum is to support a
wide range of use cases, ranging from automated processes based
on predefined models, to replaying ad-hoc research workflows
recorded from a user's actions in a scientific workbench UI. It will
allow to define and execute models from personal pipelines with a
few steps to massive models with thousands of elements.
The integration of a workflow system in a platform for scientific
software can bring many benefits :
• the steps in scientific processes are made explicitly visible in the
workflow models (i.o. being hidden inside program code).
Such models can serve as a means to present, discuss and share
scientific processes in communities with different skills-sets
• allow differentiating for different roles within a common tools set
: software engineers, internal scientists, visiting scientists etc
• promotes reuse of software assets and modular solution design
• technical services for automating complex processes in a
scalable and maintainable way
• crucial tool for advanced analytics on gigantic datasets
• integrates execution tracing, provenance data, etc.
12. The Eclipse Integrated Computational
Environment
projects.eclipse.org/projects/science.ice Active Member Companies:
The Eclipse Integrated Computational Environment (ICE)
addresses the usability needs of the scientific and
engineering community for the Big Four modeling and
simulation activities. The focus of the ICE is to develop an
easily extended and reusable set of tools that can be
used by developers to create rich user interfaces for their
modeling and simulation products. Custom widgets and
data structures with well-defined interfaces and high-
coverage unit tests are provided for plugin developers.
Plugins based on the ICE tools are also developed and
released with the ICE for codes that the developers use
and the community contributes to the extent that
resources allow. The idea is that the tools should be
available for developers to do what they do and the
deployment mechanism should be ready and waiting
when they are finished.
Each of the big four tasks presents unique challenges, but
much of the capability required can be handled by
extensions to the existing Eclipse tools.