The document provides a quick guide on how to use Matlab functions to read netCDF files. It explains that a netCDF file contains variable data and header information describing the variables. It then demonstrates various Matlab functions to inspect and extract information from the header, including the number of variables and dimensions, variable names and data types, and dimension lengths. It uses an example netCDF file containing monthly precipitation data to show how to read a single variable, as well as a subsection of a variable, from the file.
This document discusses MATLAB's support for netCDF and OPeNDAP. It introduces high-level and low-level interfaces for reading and writing netCDF files, provides examples of accessing remote OPeNDAP datasets, and notes how HDF5 files can also be read. MATLAB now includes netCDF version 4.1.3, enabling built-in support for OPeNDAP via either interface without additional configuration.
Valerii Vasylkov Erlang. measurements and benefits.Аліна Шепшелей
The document discusses the benefits of Erlang, including its functional nature, powerful pattern matching, built-in concurrency and fault tolerance through let it crash philosophy, ability to perform distributed computation, and capability for hot code upgrades without downtime. It covers Erlang's actor model approach to concurrency, use of processes and message passing, supervision trees for fault tolerance, and tools for debugging, profiling, and detecting bottlenecks.
This talk gives details about Spark internals and an explanation of the runtime behavior of a Spark application. It explains how high level user programs are compiled into physical execution plans in Spark. It then reviews common performance bottlenecks encountered by Spark users, along with tips for diagnosing performance problems in a production application.
Created at the University of Berkeley in California, Apache Spark combines a distributed computing system through computer clusters with a simple and elegant way of writing programs. Spark is considered the first open source software that makes distribution programming really accessible to data scientists. Here you can find an introduction and basic concepts.
This document discusses scaling machine learning using Apache Spark. It covers several key topics:
1) Parallelizing machine learning algorithms and neural networks to distribute computation across clusters. This includes data, model, and parameter server parallelism.
2) Apache Spark's Resilient Distributed Datasets (RDDs) programming model which allows distributing data and computation across a cluster in a fault-tolerant manner.
3) Examples of very large neural networks trained on clusters, such as a Google face detection model using 1,000 servers and a IBM brain-inspired chip model using 262,144 CPUs.
Apache Spark is a fast and general engine for large-scale data processing. It provides a unified API for batch, interactive, and streaming data processing using in-memory primitives. A benchmark showed Spark was able to sort 100TB of data 3 times faster than Hadoop using 10 times fewer machines by keeping data in memory between jobs.
Apache Spark is a fast, general engine for large-scale data processing. It supports batch, interactive, and stream processing using a unified API. Spark uses resilient distributed datasets (RDDs), which are immutable distributed collections of objects that can be operated on in parallel. RDDs support transformations like map, filter, and reduce and actions that return final results to the driver program. Spark provides high-level APIs in Scala, Java, Python, and R and an optimized engine that supports general computation graphs for data analysis.
The document discusses Resilient Distributed Datasets (RDDs) in Spark. It explains that RDDs hold references to partition objects containing subsets of data across a cluster. When a transformation like map is applied to an RDD, a new RDD is created to store the operation and maintain a dependency on the original RDD. This allows chained transformations to be lazily executed together in jobs scheduled by Spark.
This document discusses MATLAB's support for netCDF and OPeNDAP. It introduces high-level and low-level interfaces for reading and writing netCDF files, provides examples of accessing remote OPeNDAP datasets, and notes how HDF5 files can also be read. MATLAB now includes netCDF version 4.1.3, enabling built-in support for OPeNDAP via either interface without additional configuration.
Valerii Vasylkov Erlang. measurements and benefits.Аліна Шепшелей
The document discusses the benefits of Erlang, including its functional nature, powerful pattern matching, built-in concurrency and fault tolerance through let it crash philosophy, ability to perform distributed computation, and capability for hot code upgrades without downtime. It covers Erlang's actor model approach to concurrency, use of processes and message passing, supervision trees for fault tolerance, and tools for debugging, profiling, and detecting bottlenecks.
This talk gives details about Spark internals and an explanation of the runtime behavior of a Spark application. It explains how high level user programs are compiled into physical execution plans in Spark. It then reviews common performance bottlenecks encountered by Spark users, along with tips for diagnosing performance problems in a production application.
Created at the University of Berkeley in California, Apache Spark combines a distributed computing system through computer clusters with a simple and elegant way of writing programs. Spark is considered the first open source software that makes distribution programming really accessible to data scientists. Here you can find an introduction and basic concepts.
This document discusses scaling machine learning using Apache Spark. It covers several key topics:
1) Parallelizing machine learning algorithms and neural networks to distribute computation across clusters. This includes data, model, and parameter server parallelism.
2) Apache Spark's Resilient Distributed Datasets (RDDs) programming model which allows distributing data and computation across a cluster in a fault-tolerant manner.
3) Examples of very large neural networks trained on clusters, such as a Google face detection model using 1,000 servers and a IBM brain-inspired chip model using 262,144 CPUs.
Apache Spark is a fast and general engine for large-scale data processing. It provides a unified API for batch, interactive, and streaming data processing using in-memory primitives. A benchmark showed Spark was able to sort 100TB of data 3 times faster than Hadoop using 10 times fewer machines by keeping data in memory between jobs.
Apache Spark is a fast, general engine for large-scale data processing. It supports batch, interactive, and stream processing using a unified API. Spark uses resilient distributed datasets (RDDs), which are immutable distributed collections of objects that can be operated on in parallel. RDDs support transformations like map, filter, and reduce and actions that return final results to the driver program. Spark provides high-level APIs in Scala, Java, Python, and R and an optimized engine that supports general computation graphs for data analysis.
The document discusses Resilient Distributed Datasets (RDDs) in Spark. It explains that RDDs hold references to partition objects containing subsets of data across a cluster. When a transformation like map is applied to an RDD, a new RDD is created to store the operation and maintain a dependency on the original RDD. This allows chained transformations to be lazily executed together in jobs scheduled by Spark.
Summary of the lessons we learned with Docker (Dockerfile, storage, distributed networking) during the first iteration of the AdamCloud project (Fall 2014).
The AdamCloud project (part I) was presented here:
http://www.slideshare.net/davidonlaptop/bdm29-adamcloud-planification
The document provides information about Resilient Distributed Datasets (RDDs) in Spark, including how to create RDDs from external data or collections, RDD operations like transformations and actions, partitioning, and different types of shuffles like hash-based and sort-based shuffles. RDDs are the fundamental data structure in Spark, acting as a distributed collection of objects that can be operated on in parallel.
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...CloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2sf2z6i
This CloudxLab Introduction to Spark SQL & DataFrames tutorial helps you to understand Spark SQL & DataFrames in detail. Below are the topics covered in this slide:
1) Introduction to DataFrames
2) Creating DataFrames from JSON
3) DataFrame Operations
4) Running SQL Queries Programmatically
5) Datasets
6) Inferring the Schema Using Reflection
7) Programmatically Specifying the Schema
This presentation show the main Spark characteristics, like RDD, Transformations and Actions.
I used this presentation for many Spark Intro workshops from Cluj-Napoca Big Data community : http://www.meetup.com/Big-Data-Data-Science-Meetup-Cluj-Napoca/
Video: https://www.youtube.com/watch?v=kkOG_aJ9KjQ
This talk gives details about Spark internals and an explanation of the runtime behavior of a Spark application. It explains how high level user programs are compiled into physical execution plans in Spark. It then reviews common performance bottlenecks encountered by Spark users, along with tips for diagnosing performance problems in a production application.
Apache Spark has emerged over the past year as the imminent successor to Hadoop MapReduce. Spark can process data in memory at very high speed, while still be able to spill to disk if required. Spark’s powerful, yet flexible API allows users to write complex applications very easily without worrying about the internal workings and how the data gets processed on the cluster.
Spark comes with an extremely powerful Streaming API to process data as it is ingested. Spark Streaming integrates with popular data ingest systems like Apache Flume, Apache Kafka, Amazon Kinesis etc. allowing users to process data as it comes in.
In this talk, Hari will discuss the basics of Spark Streaming, its API and its integration with Flume, Kafka and Kinesis. Hari will also discuss a real-world example of a Spark Streaming application, and how code can be shared between a Spark application and a Spark Streaming application. Each stage of the application execution will be presented, which can help understand practices while writing such an application. Hari will finally discuss how to write a custom application and a custom receiver to receive data from other systems.
Presentation slides for the paper on Resilient Distributed Datasets, written by Matei Zaharia et al. at the University of California, Berkeley.
The paper is not my work.
These slides were made for the course on Advanced, Distributed Systems held by prof. Bratsberg at NTNU (Norwegian University of Science and Technology, Trondheim, Norway).
The document discusses Spark, an open-source cluster computing framework. It describes Spark's Resilient Distributed Dataset (RDD) as an immutable and partitioned collection that can automatically recover from node failures. RDDs can be created from data sources like files or existing collections. Transformations create new RDDs from existing ones lazily, while actions return values to the driver program. Spark supports operations like WordCount through transformations like flatMap and reduceByKey. It uses stages and shuffling to distribute operations across a cluster in a fault-tolerant manner. Spark Streaming processes live data streams by dividing them into batches treated as RDDs. Spark SQL allows querying data through SQL on DataFrames.
Apache Spark in Depth: Core Concepts, Architecture & InternalsAnton Kirillov
Slides cover Spark core concepts of Apache Spark such as RDD, DAG, execution workflow, forming stages of tasks and shuffle implementation and also describes architecture and main components of Spark Driver. The workshop part covers Spark execution modes , provides link to github repo which contains Spark Applications examples and dockerized Hadoop environment to experiment with
Large volume data analysis on the Typesafe Reactive PlatformMartin Zapletal
The document discusses several topics related to distributed machine learning and distributed systems including:
- Reasons for using distributed machine learning being either due to large data volumes or hopes of increased speed
- Failure rates of hardware and network links in large data centers
- Examples of database inconsistencies and data loss caused by network partitions in different distributed databases
- Key aspects of distributed data processing including data storage, integration, computing primitives, and analytics
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Martin Zapletal
The document discusses distributed machine learning and data processing. It covers several topics including reasons for using distributed machine learning, different distributed computing architectures and primitives, distributed data stores and analytics tools like Spark, streaming architectures like Lambda and Kappa, and challenges around distributed state management and fault tolerance. It provides examples of failures in distributed databases and suggestions to choose the appropriate tools based on the use case and understand their internals.
My Hadoop Ecosystem presentation at the 2011 BreizhCamp.
See the talk video (in french):
http://mediaserver.univ-rennes1.fr/videos/?video=MEDIA110628093346744
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveSachin Aggarwal
We will give a detailed introduction to Apache Spark and why and how Spark can change the analytics world. Apache Spark's memory abstraction is RDD (Resilient Distributed DataSet). One of the key reason why Apache Spark is so different is because of the introduction of RDD. You cannot do anything in Apache Spark without knowing about RDDs. We will give a high level introduction to RDD and in the second half we will have a deep dive into RDDs.
This document provides an overview of Spark SQL and its architecture. Spark SQL allows users to run SQL queries over SchemaRDDs, which are RDDs with a schema and column names. It introduces a SQL-like query abstraction over RDDs and allows querying data in a declarative manner. The Spark SQL component consists of Catalyst, a logical query optimizer, and execution engines for different data sources. It can integrate with data sources like Parquet, JSON, and Cassandra.
This document provides an overview of Apache Spark, including how it compares to Hadoop, the Spark ecosystem, Resilient Distributed Datasets (RDDs), transformations and actions on RDDs, the directed acyclic graph (DAG) scheduler, Spark Streaming, and the DataFrames API. Key points covered include Spark's faster performance versus Hadoop through its use of memory instead of disk, the RDD abstraction for distributed collections, common RDD operations, and Spark's capabilities for real-time streaming data processing and SQL queries on structured data.
This document outlines an agenda for an advanced Goldengate training covering various topics:
1) Methods for initializing data including using keys and commit SCNs.
2) Handling DML and DML errors with techniques like REPERROR and exception tables.
3) Advanced DDL synchronization and errors including filtering, substitution, and derived objects.
4) Data mapping, manipulation, and selecting rows using filters and WHERE clauses.
5) Monitoring and troubleshooting Goldengate configurations.
This document provides an overview of Spark, including:
- Spark's processing model involves chopping live data streams into batches and treating each batch as an RDD to apply transformations and actions.
- Resilient Distributed Datasets (RDDs) are Spark's primary abstraction, representing an immutable distributed collection of objects that can be operated on in parallel.
- An example word count program is presented to illustrate how to create and manipulate RDDs to count the frequency of words in a text file.
This document provides an overview of Apache Spark, including its core concepts, transformations and actions, persistence, parallelism, and examples. Spark is introduced as a fast and general engine for large-scale data processing, with advantages like in-memory computing, fault tolerance, and rich APIs. Key concepts covered include its resilient distributed datasets (RDDs) and lazy evaluation approach. The document also discusses Spark SQL, streaming, and integration with other tools.
This document provides an introduction to OpenStack, an open source software platform for building private and public clouds. It describes the key OpenStack components for compute (Nova), storage (Cinder, Glance, Swift), networking (Neutron), and identity (Keystone). It then discusses how organizations like CERN and PayPal use OpenStack to manage large amounts of data and computing resources in a scalable, distributed manner. The document concludes by outlining various ways that individuals can get involved and contribute to the OpenStack community.
This document provides an overview of Spark and its key components. Spark is a fast and general engine for large-scale data processing. It uses Resilient Distributed Datasets (RDDs) that allow data to be partitioned across clusters and cached in memory for fast performance. Spark is up to 100x faster than Hadoop for iterative jobs and provides a unified framework for batch processing, streaming, SQL, and machine learning workloads.
The document discusses the Network Simulator 2 (NS2) tool. It provides an overview of NS2's architecture, which uses C++ for the backend simulation objects and OTcl for the frontend setup. It also describes how NS2 simulations are run using a Tcl script to initialize objects, define the network topology with nodes and links, setup transport and application layer agents, schedule simulation events, and terminate the simulation. Key aspects covered include initializing and terminating the simulator, defining nodes and links, setting up TCP, UDP, and CBR agents, and scheduling the start and stop of applications.
Introduction to data structures and complexity.pptxPJS KUMAR
The document discusses data structures and algorithms. It defines data structures as the logical organization of data and describes common linear and nonlinear structures like arrays and trees. It explains that the choice of data structure depends on accurately representing real-world relationships while allowing effective processing. Key data structure operations are also outlined like traversing, searching, inserting, deleting, sorting, and merging. The document then defines algorithms as step-by-step instructions to solve problems and analyzes the complexity of algorithms in terms of time and space. Sub-algorithms and their use are also covered.
Summary of the lessons we learned with Docker (Dockerfile, storage, distributed networking) during the first iteration of the AdamCloud project (Fall 2014).
The AdamCloud project (part I) was presented here:
http://www.slideshare.net/davidonlaptop/bdm29-adamcloud-planification
The document provides information about Resilient Distributed Datasets (RDDs) in Spark, including how to create RDDs from external data or collections, RDD operations like transformations and actions, partitioning, and different types of shuffles like hash-based and sort-based shuffles. RDDs are the fundamental data structure in Spark, acting as a distributed collection of objects that can be operated on in parallel.
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...CloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2sf2z6i
This CloudxLab Introduction to Spark SQL & DataFrames tutorial helps you to understand Spark SQL & DataFrames in detail. Below are the topics covered in this slide:
1) Introduction to DataFrames
2) Creating DataFrames from JSON
3) DataFrame Operations
4) Running SQL Queries Programmatically
5) Datasets
6) Inferring the Schema Using Reflection
7) Programmatically Specifying the Schema
This presentation show the main Spark characteristics, like RDD, Transformations and Actions.
I used this presentation for many Spark Intro workshops from Cluj-Napoca Big Data community : http://www.meetup.com/Big-Data-Data-Science-Meetup-Cluj-Napoca/
Video: https://www.youtube.com/watch?v=kkOG_aJ9KjQ
This talk gives details about Spark internals and an explanation of the runtime behavior of a Spark application. It explains how high level user programs are compiled into physical execution plans in Spark. It then reviews common performance bottlenecks encountered by Spark users, along with tips for diagnosing performance problems in a production application.
Apache Spark has emerged over the past year as the imminent successor to Hadoop MapReduce. Spark can process data in memory at very high speed, while still be able to spill to disk if required. Spark’s powerful, yet flexible API allows users to write complex applications very easily without worrying about the internal workings and how the data gets processed on the cluster.
Spark comes with an extremely powerful Streaming API to process data as it is ingested. Spark Streaming integrates with popular data ingest systems like Apache Flume, Apache Kafka, Amazon Kinesis etc. allowing users to process data as it comes in.
In this talk, Hari will discuss the basics of Spark Streaming, its API and its integration with Flume, Kafka and Kinesis. Hari will also discuss a real-world example of a Spark Streaming application, and how code can be shared between a Spark application and a Spark Streaming application. Each stage of the application execution will be presented, which can help understand practices while writing such an application. Hari will finally discuss how to write a custom application and a custom receiver to receive data from other systems.
Presentation slides for the paper on Resilient Distributed Datasets, written by Matei Zaharia et al. at the University of California, Berkeley.
The paper is not my work.
These slides were made for the course on Advanced, Distributed Systems held by prof. Bratsberg at NTNU (Norwegian University of Science and Technology, Trondheim, Norway).
The document discusses Spark, an open-source cluster computing framework. It describes Spark's Resilient Distributed Dataset (RDD) as an immutable and partitioned collection that can automatically recover from node failures. RDDs can be created from data sources like files or existing collections. Transformations create new RDDs from existing ones lazily, while actions return values to the driver program. Spark supports operations like WordCount through transformations like flatMap and reduceByKey. It uses stages and shuffling to distribute operations across a cluster in a fault-tolerant manner. Spark Streaming processes live data streams by dividing them into batches treated as RDDs. Spark SQL allows querying data through SQL on DataFrames.
Apache Spark in Depth: Core Concepts, Architecture & InternalsAnton Kirillov
Slides cover Spark core concepts of Apache Spark such as RDD, DAG, execution workflow, forming stages of tasks and shuffle implementation and also describes architecture and main components of Spark Driver. The workshop part covers Spark execution modes , provides link to github repo which contains Spark Applications examples and dockerized Hadoop environment to experiment with
Large volume data analysis on the Typesafe Reactive PlatformMartin Zapletal
The document discusses several topics related to distributed machine learning and distributed systems including:
- Reasons for using distributed machine learning being either due to large data volumes or hopes of increased speed
- Failure rates of hardware and network links in large data centers
- Examples of database inconsistencies and data loss caused by network partitions in different distributed databases
- Key aspects of distributed data processing including data storage, integration, computing primitives, and analytics
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Martin Zapletal
The document discusses distributed machine learning and data processing. It covers several topics including reasons for using distributed machine learning, different distributed computing architectures and primitives, distributed data stores and analytics tools like Spark, streaming architectures like Lambda and Kappa, and challenges around distributed state management and fault tolerance. It provides examples of failures in distributed databases and suggestions to choose the appropriate tools based on the use case and understand their internals.
My Hadoop Ecosystem presentation at the 2011 BreizhCamp.
See the talk video (in french):
http://mediaserver.univ-rennes1.fr/videos/?video=MEDIA110628093346744
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveSachin Aggarwal
We will give a detailed introduction to Apache Spark and why and how Spark can change the analytics world. Apache Spark's memory abstraction is RDD (Resilient Distributed DataSet). One of the key reason why Apache Spark is so different is because of the introduction of RDD. You cannot do anything in Apache Spark without knowing about RDDs. We will give a high level introduction to RDD and in the second half we will have a deep dive into RDDs.
This document provides an overview of Spark SQL and its architecture. Spark SQL allows users to run SQL queries over SchemaRDDs, which are RDDs with a schema and column names. It introduces a SQL-like query abstraction over RDDs and allows querying data in a declarative manner. The Spark SQL component consists of Catalyst, a logical query optimizer, and execution engines for different data sources. It can integrate with data sources like Parquet, JSON, and Cassandra.
This document provides an overview of Apache Spark, including how it compares to Hadoop, the Spark ecosystem, Resilient Distributed Datasets (RDDs), transformations and actions on RDDs, the directed acyclic graph (DAG) scheduler, Spark Streaming, and the DataFrames API. Key points covered include Spark's faster performance versus Hadoop through its use of memory instead of disk, the RDD abstraction for distributed collections, common RDD operations, and Spark's capabilities for real-time streaming data processing and SQL queries on structured data.
This document outlines an agenda for an advanced Goldengate training covering various topics:
1) Methods for initializing data including using keys and commit SCNs.
2) Handling DML and DML errors with techniques like REPERROR and exception tables.
3) Advanced DDL synchronization and errors including filtering, substitution, and derived objects.
4) Data mapping, manipulation, and selecting rows using filters and WHERE clauses.
5) Monitoring and troubleshooting Goldengate configurations.
This document provides an overview of Spark, including:
- Spark's processing model involves chopping live data streams into batches and treating each batch as an RDD to apply transformations and actions.
- Resilient Distributed Datasets (RDDs) are Spark's primary abstraction, representing an immutable distributed collection of objects that can be operated on in parallel.
- An example word count program is presented to illustrate how to create and manipulate RDDs to count the frequency of words in a text file.
This document provides an overview of Apache Spark, including its core concepts, transformations and actions, persistence, parallelism, and examples. Spark is introduced as a fast and general engine for large-scale data processing, with advantages like in-memory computing, fault tolerance, and rich APIs. Key concepts covered include its resilient distributed datasets (RDDs) and lazy evaluation approach. The document also discusses Spark SQL, streaming, and integration with other tools.
This document provides an introduction to OpenStack, an open source software platform for building private and public clouds. It describes the key OpenStack components for compute (Nova), storage (Cinder, Glance, Swift), networking (Neutron), and identity (Keystone). It then discusses how organizations like CERN and PayPal use OpenStack to manage large amounts of data and computing resources in a scalable, distributed manner. The document concludes by outlining various ways that individuals can get involved and contribute to the OpenStack community.
This document provides an overview of Spark and its key components. Spark is a fast and general engine for large-scale data processing. It uses Resilient Distributed Datasets (RDDs) that allow data to be partitioned across clusters and cached in memory for fast performance. Spark is up to 100x faster than Hadoop for iterative jobs and provides a unified framework for batch processing, streaming, SQL, and machine learning workloads.
The document discusses the Network Simulator 2 (NS2) tool. It provides an overview of NS2's architecture, which uses C++ for the backend simulation objects and OTcl for the frontend setup. It also describes how NS2 simulations are run using a Tcl script to initialize objects, define the network topology with nodes and links, setup transport and application layer agents, schedule simulation events, and terminate the simulation. Key aspects covered include initializing and terminating the simulator, defining nodes and links, setting up TCP, UDP, and CBR agents, and scheduling the start and stop of applications.
Introduction to data structures and complexity.pptxPJS KUMAR
The document discusses data structures and algorithms. It defines data structures as the logical organization of data and describes common linear and nonlinear structures like arrays and trees. It explains that the choice of data structure depends on accurately representing real-world relationships while allowing effective processing. Key data structure operations are also outlined like traversing, searching, inserting, deleting, sorting, and merging. The document then defines algorithms as step-by-step instructions to solve problems and analyzes the complexity of algorithms in terms of time and space. Sub-algorithms and their use are also covered.
The document provides an introduction to various MATLAB fundamentals including:
- Modeling the problem of a falling object using differential equations and analytical/numerical solutions.
- Conservation laws that constrain numerical solutions.
- MATLAB commands for defining variables, arrays, matrices, and performing basic operations.
- Plotting the velocity-time solution and customizing graphs.
- Describing algorithms using flowcharts and pseudocode.
- Structured programming in MATLAB using scripts, functions, decisions, and loops.
The document discusses extending the ns network simulator. It can be extended in OTcl by modifying code and adding new files without recompiling. It can also be extended in C++ by modifying code, adding new files, and recompiling. The document provides examples of extending ns by adding a new message agent and packet header in both OTcl and C++. It also discusses debugging extensions in OTcl and C++ as well as addressing memory issues.
This document provides instructions and content for a computer network laboratory manual. It includes:
1. Instructions for students on preparation, maintaining records, obtaining signatures, and proper equipment use.
2. A table of contents listing experiments on topics like NS2 basics, point-to-point networks, wireless LANs, and algorithms.
3. An introduction to the NS2 network simulator, including its Tcl scripting language components, basic architecture, and how to initialize, define nodes/links, and configure agents and applications in a simulation.
The document discusses algorithms, data abstraction, asymptotic analysis, arrays, polynomials, and sparse matrices. It defines algorithms and discusses their advantages and disadvantages. It explains how to design an algorithm and describes iterative and recursive algorithms. It defines data abstraction and gives an example using smartphones. It discusses time and space complexity analysis and different asymptotic notations like Big O, Omega, and Theta. It describes what arrays are, different types of arrays, and applications of arrays. It explains how to represent and add polynomials using linked lists. Finally, it defines sparse matrices and two methods to represent them using arrays and linked lists.
This document provides basic guidelines for MATLAB programming. It includes:
- An overview of the MATLAB window and commonly used commands like disp, fprintf, clear, clc.
- Examples of how to perform matrix calculations directly in the command window like matrix multiplication, element-wise multiplication, and operations on vectors.
- How to define 1D and 2D arrays using for loops and take input from the user.
- An introduction to function handlers using the @ symbol and an example of finding the roots of a transcendental equation.
- A brief discussion of plotting multiple curves using subplots and reading data from an Excel file.
- Suggestions for making your own functions,
This document describes research using a 6-node supercomputer made of Raspberry Pi boards to calculate Dedekind numbers in parallel. The researchers implemented a parallel version of an existing algorithm to compute Dedekind numbers by dividing the workload across the 6 nodes. They present results showing the parallel implementation provides significant speedup over running the algorithm on a single node, though the Raspberry Pi hardware is less powerful than desktop computers.
This document discusses various techniques for optimizing R code performance, including profiling code to identify bottlenecks, vectorizing operations, avoiding copies, and byte code compilation. It provides examples demonstrating how to measure performance, compare alternative implementations, and apply techniques like doing less work, vectorization, and avoiding method dispatch overhead. The key message is that optimizing performance is an iterative process of measuring, testing alternatives, and applying strategies like these to eliminate bottlenecks.
Parallel Implementation of K Means Clustering on CUDAprithan
K-Means clustering is a popular clustering algorithm in data mining. Clustering large data sets can be
time consuming, and in an attempt to minimize this time, our project is a parallel implementation of KMeans
clustering algorithm on CUDA using C. We present the performance analysis and implementation
of our approach to parallelizing K-Means clustering.
This document provides definitions and explanations of key concepts in algorithm design and analysis including:
- Performance measurement is concerned with obtaining the space and time requirements of algorithms.
- An algorithm is a finite set of instructions that accomplishes a task given certain inputs and criteria.
- Time complexity refers to the amount of computer time needed for an algorithm to complete, while space complexity refers to the memory required.
- Common asymptotic notations like Big-O, Omega, and Theta are used to describe an algorithm's scalability.
- Divide-and-conquer and greedy algorithms are important design techniques that break problems into subproblems.
OpenSees: modeling and performing static analysisDhanaji Chavan
This document provides steps for modeling and performing static analysis in OpenSees. It begins with general steps like defining nodes, elements, materials, boundary conditions, and analysis objects. Then it provides an example of coding for a cantilever beam problem in OpenSees. Key steps in the example include defining two nodes, fixing the first node, using an elastic beam column element, applying a point load, and running the analysis to obtain the deflection of the free end. The document also discusses considerations for choosing appropriate elements and defines terms used in finite element modeling like degrees of freedom.
Nagios Conference 2014 - Rob Seiwert - Graphing and Trend Prediction in NagiosNagios
Rob Seiwert's presentation on Graphing and Trend Prediction in Nagios.
The presentation was given during the Nagios World Conference North America held Oct 13th - Oct 16th, 2014 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/conference
This document provides an overview of running an image classification workload using IBM PowerAI and the MNIST dataset. It discusses deep learning concepts like neural networks and training flows. It then demonstrates how to set up TensorFlow on an IBM PowerAI trial server, load the MNIST dataset, build and train a basic neural network model for image classification, and evaluate the trained model's accuracy on test data.
CP4151 Advanced data structures and algorithmsSheba41
The document introduces algorithms and their role in computing. It defines algorithms as tools for solving well-specified computational problems. Algorithms must be correct, producing the appropriate output for each input, and efficient, using as little resources as possible. Examples of algorithms provided include sorting a list alphabetically and determining if a number is prime. Pseudo-code is used to describe algorithms in a language-independent way.
The document discusses algorithms and data structures. It begins by introducing common data structures like arrays, stacks, queues, trees, and hash tables. It then explains that data structures allow for organizing data in a way that can be efficiently processed and accessed. The document concludes by stating that the choice of data structure depends on effectively representing real-world relationships while allowing simple processing of the data.
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
1) The document proposes a mathematical model and optimization service to predict the optimal number of parallel TCP streams needed to maximize data throughput in a distributed computing environment.
2) It develops a novel model that can predict the optimal number using only three data points, and implements this service in the Stork Data Scheduler.
3) Experimental results show the optimized transfer time using this prediction and optimization service is much less than without optimization in most cases.
A fast-paced introduction to Deep Learning that starts with a simple yet complete neural network (no frameworks), followed by an overview of activation functions, cost functions, backpropagation, and then a quick dive into CNNs. Next we'll create a neural network using Keras, followed by an introduction to TensorFlow and TensorBoard. For best results, familiarity with basic vectors and matrices, inner (aka "dot") products of vectors, and rudimentary Python is definitely helpful.
This document discusses algorithms and their complexity. It provides an example of a linear search algorithm to find a target value in an array. The complexity of this algorithm is analyzed for the worst and average cases. In the worst case, the target is the last element and all n elements must be checked, resulting in O(n) time complexity. On average, about half the elements (n+1)/2 need to be checked, resulting in average time complexity of O(n).
O documento discute a transformação de dados digitais de imagens em produtos úteis. Explica como os dados brutos de pixels em imagens infravermelho e visível são convertidos em valores físicos como radiação e temperatura por meio da navegação e calibração do satélite. A navegação é necessária para determinar a localização exata dos pixels, enquanto a calibração converte os valores brutos em unidades como miliwatts por metro quadrado.
Este documento discute os conceitos de resolução para sensores remotos, incluindo:
1) Resolução espectral refere-se ao número e tamanho de intervalos de comprimento de onda aos quais um sensor é sensível.
2) Resolução espacial mede o menor espaçamento entre objetos que podem ser distinguidos.
3) Resolução temporal é a frequência com que imagens de uma área são obtidas.
4) Resolução radiométrica descreve a capacidade de um sensor detectar pequenas variações na energia medida.
1. O documento discute os diferentes tipos de sensores de imageamento remoto, incluindo sensores de varredura transversal e ao longo da trajetória do satélite.
2. Sensores de varredura transversal usam um espelho rotativo para varredura em linhas perpendiculares à direção do satélite, enquanto sensores de varredura ao longo da trajetória usam uma matriz linear de detectores empurrada ao longo da órbita.
3. O tempo de permanência, que afeta a qualidade da imagem, é maior para sensores
1) O documento discute os princípios da radiação eletromagnética e como a energia interage com a atmosfera, superfície e sensores.
2) Existem dois modelos para descrever a radiação eletromagnética: o modelo de onda e o modelo de partícula (fóton).
3) A radiação eletromagnética varia em comprimento de onda, indo de raios gama a ondas de rádio, e cada tipo de radiação tem propriedades e usos diferentes.
Capítulo 3 2014_pos proncipios da radiacao eletromagneticaGilberto Pimentel
O documento discute os princípios da radiação eletromagnética e suas interações com a matéria. Apresenta dois modelos para descrever a radiação: o modelo de onda e o modelo de partícula. Explica que a radiação eletromagnética pode ser descrita como uma onda com comprimento de onda e frequência, e sua velocidade de propagação é a velocidade da luz.
O documento discute a importância das observações e medições meteorológicas para entender fenômenos na natureza. Essas observações podem ser realizadas diretamente no campo (in situ) ou à distância (sensoriamento remoto), sendo este último uma medida indireta que apresenta desafios. As limitações do sensoriamento remoto incluem a necessidade de solucionar problemas inversos para estimar parâmetros a partir de medidas indiretas.
O documento discute os princípios da radiação eletromagnética e suas interações com a matéria. Explica que a radiação eletromagnética pode ser descrita por dois modelos: o modelo de onda e o modelo de partícula. Também define termos importantes como comprimento de onda e frequência e descreve a relação entre esses termos e a velocidade da luz.
La atmósfera terrestre está compuesta principalmente por nitrógeno (75,5%), oxígeno (23,1%), argón (1,3%) y vapor de agua (0-4%). Se divide en cuatro capas - la troposfera, estratosfera, mesosfera y termosfera - definidas por cambios en la temperatura con la altitud. Las diferencias de temperatura entre los polos y el ecuador generan vientos que circulan la atmósfera y compensan las variaciones en la presión atmosférica causadas por el calentamiento y enfriamiento del
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
Null Bangalore | Pentesters Approach to AWS IAMDivyanshu
#Abstract:
- Learn more about the real-world methods for auditing AWS IAM (Identity and Access Management) as a pentester. So let us proceed with a brief discussion of IAM as well as some typical misconfigurations and their potential exploits in order to reinforce the understanding of IAM security best practices.
- Gain actionable insights into AWS IAM policies and roles, using hands on approach.
#Prerequisites:
- Basic understanding of AWS services and architecture
- Familiarity with cloud security concepts
- Experience using the AWS Management Console or AWS CLI.
- For hands on lab create account on [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
# Scenario Covered:
- Basics of IAM in AWS
- Implementing IAM Policies with Least Privilege to Manage S3 Bucket
- Objective: Create an S3 bucket with least privilege IAM policy and validate access.
- Steps:
- Create S3 bucket.
- Attach least privilege policy to IAM user.
- Validate access.
- Exploiting IAM PassRole Misconfiguration
-Allows a user to pass a specific IAM role to an AWS service (ec2), typically used for service access delegation. Then exploit PassRole Misconfiguration granting unauthorized access to sensitive resources.
- Objective: Demonstrate how a PassRole misconfiguration can grant unauthorized access.
- Steps:
- Allow user to pass IAM role to EC2.
- Exploit misconfiguration for unauthorized access.
- Access sensitive resources.
- Exploiting IAM AssumeRole Misconfiguration with Overly Permissive Role
- An overly permissive IAM role configuration can lead to privilege escalation by creating a role with administrative privileges and allow a user to assume this role.
- Objective: Show how overly permissive IAM roles can lead to privilege escalation.
- Steps:
- Create role with administrative privileges.
- Allow user to assume the role.
- Perform administrative actions.
- Differentiation between PassRole vs AssumeRole
Try at [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...shadow0702a
This document serves as a comprehensive step-by-step guide on how to effectively use PyCharm for remote debugging of the Windows Subsystem for Linux (WSL) on a local Windows machine. It meticulously outlines several critical steps in the process, starting with the crucial task of enabling permissions, followed by the installation and configuration of WSL.
The guide then proceeds to explain how to set up the SSH service within the WSL environment, an integral part of the process. Alongside this, it also provides detailed instructions on how to modify the inbound rules of the Windows firewall to facilitate the process, ensuring that there are no connectivity issues that could potentially hinder the debugging process.
The document further emphasizes on the importance of checking the connection between the Windows and WSL environments, providing instructions on how to ensure that the connection is optimal and ready for remote debugging.
It also offers an in-depth guide on how to configure the WSL interpreter and files within the PyCharm environment. This is essential for ensuring that the debugging process is set up correctly and that the program can be run effectively within the WSL terminal.
Additionally, the document provides guidance on how to set up breakpoints for debugging, a fundamental aspect of the debugging process which allows the developer to stop the execution of their code at certain points and inspect their program at those stages.
Finally, the document concludes by providing a link to a reference blog. This blog offers additional information and guidance on configuring the remote Python interpreter in PyCharm, providing the reader with a well-rounded understanding of the process.
Comparative analysis between traditional aquaponics and reconstructed aquapon...bijceesjournal
The aquaponic system of planting is a method that does not require soil usage. It is a method that only needs water, fish, lava rocks (a substitute for soil), and plants. Aquaponic systems are sustainable and environmentally friendly. Its use not only helps to plant in small spaces but also helps reduce artificial chemical use and minimizes excess water use, as aquaponics consumes 90% less water than soil-based gardening. The study applied a descriptive and experimental design to assess and compare conventional and reconstructed aquaponic methods for reproducing tomatoes. The researchers created an observation checklist to determine the significant factors of the study. The study aims to determine the significant difference between traditional aquaponics and reconstructed aquaponics systems propagating tomatoes in terms of height, weight, girth, and number of fruits. The reconstructed aquaponics system’s higher growth yield results in a much more nourished crop than the traditional aquaponics system. It is superior in its number of fruits, height, weight, and girth measurement. Moreover, the reconstructed aquaponics system is proven to eliminate all the hindrances present in the traditional aquaponics system, which are overcrowding of fish, algae growth, pest problems, contaminated water, and dead fish.
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Matlab netcdf guide
1. A quick guide on how to use Matlab netCDF functions
Prepared by HP Huang (hp.huang@asu.edu), Sept 2009. Revised March 2015. Please email HPH for
any questions/comments.
A netCDF file contains two parts: A "header" that describes the names, dimensions, etc., of the
variables stored in the file, and the main body that contains the real data. To process a netCDF file, we
need to first extract the information in the header and determine what portion/segment of the data we
want to use. This is usually done by using a set of stand-alone tools. This is, in fact, the more tedious
task which we will discuss in Part (A). Once the content of a netCDF file is known, it is rather
straightforward to read the data as will be discussed in Part (B).
(A) Inspect the content of a netCDF file
We will use the netCDF data file, precip.mon.ltm.nc, as an example to explain how the Matlab
functions work. This file, extracted and downloaded from the the NOAA ESRL-PSD web portal
(www.esrl.noaa.gov/psd), contains the CMAP gridded data for the long-term mean of monthly
precipitation for the global domain. It contains the values of precipitation rate on a regular longitude-
latitude grid for the climatological means for January, February, ..., December.
Step 0: Display/check the header of the netCDF file
ncdisp('precip.mon.ltm.nc')
The output is listed in Appendix A.
Remark: The ncdisp command "dumps" out the whole header, which contains the essential information
about the variables and their dimensions in the netCDF file. Running ncdisp is equivalent to running
"ncdump -c" (see Appendix B) on Linux-based platforms. It is strongly recommended that the
information in the header be examined before one uses the data in a netCDF file. In fact, just by
carefully inspecting the header, one can skip many steps discussed in Step 2-4 in this section.
Step 1: Open the file
ncid1 = netcdf.open('precip.mon.ltm.nc','NC_NOWRITE')
Remark: This command "opens" a netCDF file named precip.mon.ltm.nc and assigns a file number to
it. This file number is the sole output of the function and is put into our variable, ncid1, for later uses.
The option 'NC_NOWRITE' designates that the file is read-only (that is what we need for now).
Steps 2-4 are often not necessary if one has already obtained the needed information about
specific variables by running ncdisp in Step 0.
Step 2: Inspect the number of variables and number of dimensions, etc., in the file
[ndim, nvar, natt, unlim] = netcdf.inq(ncid1)
ndim = 3
1
2. nvar = 4
natt = 6
unlim = 2
Remark: After opening a file and obtaining its "file number", we next inquire what's in the file. The
output of the function, netcdf.inq, is a four-element array that contains the number of dimension,
number of variable, number of attribute, etc. (Let's focus on the first two elements for now.) From those
outputs, we learned that the file, "precip.mon.ltm.nc", contains 4 variables (nvar = 4) and maximum of
3 dimensions (ndim = 3).
Step 3: Extract further information for each "dimension"
[dimname, dimlength] = netcdf.inqDim(ncid1, 0)
dimname = lon
dimlength = 144
[dimname, dimlength] = netcdf.inqDim(ncid1, 1)
dimname = lat
dimlength = 72
[dimname, dimlength] = netcdf.inqDim(ncid1, 2)
dimname = time
dimlength = 12
Remark: From Step 2, we learned that there can be maximum of three dimensions for any variable
stored in 'precip.mon.ltm.nc'. We then call the function, netcdf.inqDim(ncid, dimid), to ask about the
detail of each dimension, one at a time. Note that the parameter, dimid, runs from 0 to 2 instead of 1
to 3 (dimension number zero, dimid = 0, corresponds to the 1st dimension; dimension number one,
dimid = 1, corresponds to the 2nd dimension, and so on). This is because Matlab netCDF utilities
adopt the convention of C language, in which the counting of numbers often starts from zero instead of
one. Here, we stop at dimension number 2 (or, the 3rd dimension) because from Step 2 we already
determined that the maximum number of dimension is 3. From this step, we know that the coordinate
for the first dimension is stored in an array called "lon" with 144 elements, the 2nd dimension is "lat"
with 72 elements, and the 3rd dimension is "time" with 12 elements.
Step 4: Extract further information for each variable
[varname, xtype, dimid, natt] = netcdf.inqVar(ncid1, 0)
varname = lat
2
3. xtype = 5
dimid = 1
natt = 3
Remark: In this example, we first extract the name, dimension, etc., of the first variable. Again, for this
function, netcdf.inqVar(ncid1, varid), the parameter "varid" starts from 0 instead of 1. This is a
convention adopted from C language; See remarks for Step 3. The outcome tells us that the 1st
variable is named "lat". It is of type "5" which translates to "float" or "real". (The Matlab
documentation provided by mathworks.com does not have the detail on this point, but the definition of
xtype is available from the documentation for the C interface for netCDF; See p. 41 of that
documentation. Some commonly encountered types are 2 = character, 3 = short integer, 4 = integer, 5
= real, and 6 = double.) Note that if one simply reads the header obtained by running ncdisp in Step 0,
one would immediately see that the "lat" variable is of "real" or "float" type.
The "dimid = 1" in the output tells us that the variable "lat" is a one-dimensional array (because
dimid is a single number, not an array) and the coordinate in that dimension is defined by
"dimension number 1". Recall that in Step 3 the comamnd, [dimname, dimlength] =
netcdf.inqDim(ncid1, 1), returned dimname = lat, dimlength = 72. Therefore, the variable "lat" is an
array with 72 elements, and the coordinate for these 72 elements are defined by the dimension, "lat",
which itself has 72 elements. (Here, "lat" is both a variable and a dimension or coordinate. If this
sounds confusing, keep reading and it will clear later.) Let's not worry about the 4th parameter
("attribution") for now.
[varname, xtype, dimid, natt] = netcdf.inqVar(ncid1, 1)
varname = lon
xtype = 5
dimid = 0
natt = 3
Remark: We find that the second variable is called "lon". It is of "real" type. It is a one-dimensional
array with its coordinate defined by "dimension number zero", i.e., the 1st dimension or "lon", with 144
elements. Again, "lon" is both a "variable" and a "dimension" (or coordinate).
[varname, xtype, dimid, natt] = netcdf.inqVar(ncid1, 2)
varname = time
xtype = 6
dimid = 2
3
4. natt = 6
Remark: The third variable is called "time". It is of "double" type (xtype = 6). It is a one-dimensional
array with its coordinate defined by "dimension number 2", i.e., the 3rd dimension or "time", with 12
elements. Again, "time" is both a "variable" and a "dimension" (or coordinate). (This is very common
of a netCDF file.)
[varname, xtype, dimid, natt] = netcdf.inqVar(ncid1, 3)
varname = precip
xtype = 5
dimid = 0, 1, 2
natt = 14
Remark: We now obtain the information of the 4th and final variable. (We know there are only 4
variables in this file because nvar = 4 in Step 2.) It is named "precip". It is of "real" type. It is a three-
dimensional variable, because dimid = [0, 1, 2], an array with three elements. Moreover, the 1st
dimension corresponds to "dimension number zero" given by Step 3, the 2nd dimension corresponds to
"dimension number 1", and the third dimension corresponds to "dimension number 2". Therefore, the
variable "precip" has the dimension of (144, 72, 12), with total of 144x72x12 elements of real numbers.
The coordinates for the 1st, 2nd, and 3rd dimensions are defined by lon(144), lat(72), and time(12).
Summary
From the above, we learned that there are four variables in the netCDF file, "lon", "lat", "time", and
"precip". The first three are there only to provide the coordinate (or "metadata") for the 4th variable,
which contains the real data that we want to use. Symbolically, the 4th variable is
precip : Pi, j,t k , i = 1-144, j = 1-72, and k = 1-12 ,
and the first three are
lat : j , j = 1-72 lon : i , i = 1-144 time : tk , k = 1-12
The arrangement of the block of data for "precip" is illustrated in Fig. 1.
4
5. Fig. 1
Here, we use the "normal" convention with each of the indices, i, j, and k, starting from 1, whereas the
Matlab netCDF functions adopt C convention such that the counting starts from zero. Figure 2 is a
slightly modified diagram that illustrates the actual numbers that we should use to extract the data using
the Matlab functions.
Fig. 2
5
1 2 3 4 144
lon
1
2
3
72
lat
1
2
3
12
tim
e
FebruaryMarch
January
0 1 2 3 143
0
1
71
0
1
2
112
6. (B) Read a variable from the netCDF file
From Part (A), we now know the content of the netCDF file. We will next use some examples to
illustrate how to read a selected variable, or a "subsection" of a variable, from the file. This is generally
a very straightforward task and only involve calling the function, netcdf.getVar, in a one-line
command. (Read the documentation for that function at mathworks.com. It is useful.)
Example 1: Read the content of the variable "lat" (this is the latitude for the precipitation data)
ncid1 = netcdf.open('precip.mon.ltm.nc','NC_NOWRITE');
lat1 = netcdf.getVar(ncid1,0,0,72)
Result:
88.7500
86.2500
83.7500
81.2500
78.7500
76.2500
...
...
-81.2500
-83.7500
-86.2500
-88.7500
Remark: The first line of command opens the file and assigns it a file number, which is returned to
ncid1. In the second line, lat1 = netcdf.getVar(ncid1, varid, start, count), we choose
● ncid1 as the outcome from first line of code, so we know that the file we will be using is
"precip.mon.ltm.nc"
● varid = 0, which means we read variable number zero, or the 1st variable, which is "lat", a real array
with 72 elements - see Step 4 in Part A. As always, remember that Matlab netCDF functions adopt the
convention of C language: The counting starts from zero instead of one. See further remark at bottom
of Step 3 in Part A.
● start = 0, which means we read the segment of the variable starting from its 1st element. (Again,
remember that counting starts from zero.)
● count = 72, which means we read total of 72 elements, starting from the 1st element (as defined by
start = 0). In other words, we read the array, [lat(1), lat(2), lat(3), ..., lat(71), lat(72)], and put it into our
own variable, lat1, in the left hand side. As Matlab dumps the content of lat1, we see that the contents
of "lat" in the netCDF file are [88.75, 86.25, 83.75, ..., -86.25, -88.75]. These are the latitude (from
88.75°N to 88.75°S) for the precipitation data.
6
7. Example 2: Read the full 144 x 72 global precipitation field for time = 1 (i.e., January climatology),
then make a contour plot of it
ncid1 = netcdf.open('precip.mon.ltm.nc','NC_NOWRITE');
precJanuary = netcdf.getVar(ncid1,3,[0 0 0],[144 72 1]);
lon1 = netcdf.getVar(ncid1,1,0,144);
lat1 = netcdf.getVar(ncid1,0,0,72);
for p = 1:144
for q = 1:72
% -- the following 3 lines provides a quick way to remove missing values --
if abs(precJanuary(p,q)) > 99
precJanuary(p,q) = 0;
end
% ----------------------------------------------------------------
map1(q,p) = precJanuary(p,q);
end
end
contour(lon1,lat1,map1)
Result:
7
8. Remarks: Only the first 4 lines of the code are related to reading the netCDF file. The rest are
commands for post-processing and plotting the data. In the second line,
PrecJanuary = netcdf.getVar(ncid1, varid, start, count) ,
we choose ncid1 from the outcome of the first line of code, so we know that the file we are using is
"precip.mon.ltm.nc". In addition, varid = 3, which means we read the 4th variable, "precip"; start = [0
0 0], which indicates that we read the 1st, 2nd, and 3rd dimension from i = 0, j = 0, and k = 0 as
illustrated in Fig.2; count = [144 72 1], which indicates that we actually read i = 0-143 (the total count
of grid point is 144), j = 0-71 (total count is 72), and k = 0-0 (total count is 1) in Fig. 2 (which
correspond to i = 1-144, j = 1-72, k = 1-1 in Fig. 1). Essentially, we just cut a 144 x 72 slice of the data
with k = 1.
Example 3: Read the full 144 x 72 global precipitation field for time = 1 (i.e., January climatology),
then make a color+contour plot of it
ncid1 = netcdf.open('precip.mon.ltm.nc','NC_NOWRITE');
precJanuary = netcdf.getVar(ncid1,3,[0 0 0],[144 72 1],'double');
lon1 = netcdf.getVar(ncid1,1,0,144);
lat1 = netcdf.getVar(ncid1,0,0,72);
for p = 1:144
for q = 1:72
if abs(precJanuary(p,q)) > 99
precJanuary(p,q) = 0;
end
map1(q,p) = precJanuary(p,q);
end
end
pcolor(lon1,lat1,map1)
shading interp
colorbar
hold on
contour(lon1,lat1,map1,'k-')
Result:
8
9. Remark: This example is almost identical to Example 2, but note that in the second line,
PrecJanuary = netcdf.getVar(ncid1, varid, start, count, output_type),
we have an extra parameter, output_type = 'double'. This helps convert the outcome (which is put
into our variable, PrecJanuary) from "real" to "double" type. This is needed in this example, because
the function, pcolor (for making a color map) requires that its input be of "double" type. [Typically,
for a 32-bit machine, variables of "real" and "double" types contain 4 bytes (or 32 bits) and 8 bytes (or
64 bits), respectively. This detail is not critical.] Without the extra parameter for the conversion, as is
the case with Example 2, Matlab would return the values of the variable in its original type, which is
"real". (We know it from Step 4 in Part (A) that xtype = 5 for this variable.)
Appendix A. The header of a netCDF file as the output from the example in Step 0
Executing the command, ncdisp('precip.mon.ltm.nc'), would produce the following detailed output
which is the header of the netCDF file, precip.mon.ltm.nc.
Source:
m:small_projectprecip.mon.ltm.nc
Format:
classic
Global Attributes:
Conventions = 'COARDS'
title = 'CPC Merged Analysis of Precipitation (excludes NCEP Reanalysis)'
9
10. history = 'created 12/98 by CASData version v207
'
platform = 'Analyses'
source = 'ftp ftp.cpc.ncep.noaa.gov precip/cmap/monthly'
documentation = 'http://www.cdc.noaa.gov/cdc/data.cmap.html'
Dimensions:
lon = 144
lat = 72
time = 12 (UNLIMITED)
Variables:
lat
Size: 72x1
Dimensions: lat
Datatype: single
Attributes:
units = 'degrees_north'
actual_range = [8.88e+01 -8.88e+01]
long_name = 'Latitude'
lon
Size: 144x1
Dimensions: lon
Datatype: single
Attributes:
units = 'degrees_east'
long_name = 'Longitude'
actual_range = [1.25e+00 3.59e+02]
time
Size: 12x1
Dimensions: time
Datatype: double
Attributes:
units = 'hours since 1-1-1 00:00:0.0'
long_name = 'Time'
delta_t = '0000-01-00 00:00:00'
actual_range = [0.00e+00 8.02e+03]
avg_period = '0000-01-00 00:00:00'
ltm_range = [1.73e+07 1.75e+07]
precip
Size: 144x72x12
Dimensions: lon,lat,time
Datatype: single
Attributes:
long_name = 'Average Monthly Rate of Precipitation'
valid_range = [0.00e+00 5.00e+01]
units = 'mm/day'
add_offset = 0
scale_factor = 1
actual_range = [0.00e+00 3.41e+01]
10
11. missing_value = -9.97e+36
precision = 3.28e+04
least_significant_digit = 2
var_desc = 'Precipitation'
dataset = 'CPC Merged Analysis of Precipitation Standard'
level_desc = 'Surface'
statistic = 'Mean'
parent_stat = 'Mean'
Appendix B: Obtaining the header using Linux-based utilities
The header of a netCDF file can also be extracted by using Linux-based utilities. The most commonly
used tool is "ncdump". The Linux equivalent of running "ncdisp('precip.mon.ltm.nc') is
ncdump -c precip.mon.ltm.nc
Note that the "-c" option is essential. With it, the command only dumps the header. Without it, the
command will dump all data in the netCDF file.
Result:
netcdf precip.mon.ltm {
dimensions:
lon = 144 ;
lat = 72 ;
time = UNLIMITED ; // (12 currently)
variables:
float lat(lat) ;
lat:units = "degrees_north" ;
lat:actual_range = 88.75f, -88.75f ;
lat:long_name = "Latitude" ;
float lon(lon) ;
lon:units = "degrees_east" ;
lon:long_name = "Longitude" ;
lon:actual_range = 1.25f, 358.75f ;
double time(time) ;
time:units = "hours since 1-1-1 00:00:0.0" ;
time:long_name = "Time" ;
time:delta_t = "0000-01-00 00:00:00" ;
time:actual_range = 0., 8016. ;
time:avg_period = "0000-01-00 00:00:00" ;
time:ltm_range = 17338824., 17530944. ;
float precip(time, lat, lon) ;
precip:long_name = "Average Monthly Rate of Precipitation" ;
precip:valid_range = 0.f, 50.f ;
precip:units = "mm/day" ;
precip:add_offset = 0.f ;
precip:scale_factor = 1.f ;
precip:actual_range = 0.f, 34.05118f ;
11