آموزش مقدماتی تنسورفلو
– مقایسه چارچوبهای تحلیل با رویکرد یادگیری ژرف
– مفاهیم گراف محاسباتی
– مقدمات آشنایی با TensorFlow
– مفاهیم اولیه TensorFlow همچون placeholder،variable،session و operation
– بیان و تحلیل یک مسئله ساده با TensorFlow
From Word Embeddings To Document Distances
We present the Word Mover’s Distance (WMD), a novel distance function between text documents. Our work is based on recent results in word embeddings that learn semantically meaningful representations for words from local cooccurrences in sentences. The WMD distance measures the dissimilarity between two text documents as the minimum amount of distance that the embedded words of one document need to “travel” to reach the embedded words of another document. We show that this distance metric can be cast as an instance of the Earth Mover’s Distance, a well studied transportation problem for which several highly efficient solvers have been developed. Our metric has no hyperparameters and is straight-forward to implement. Further, we demonstrate on eight real world document classification data sets, in comparison with seven state-of-the-art baselines, that the WMD metric leads to unprecedented low k-nearest neighbor document classification error rates.
This document discusses periodic functions and Fourier series. A periodic function repeats its values over regular intervals called periods. The Fourier series represents periodic functions as the sum of trigonometric functions (sines and cosines) with different frequencies. The document derives the formulas to calculate the coefficients of the Fourier series from a given periodic function. It involves integrating the function multiplied by sines and cosines over one period of the function.
This document summarizes the Banker's Algorithm, which is used to determine if a set of pending processes can safely acquire resources or if they should wait due to limited resources. It outlines the key data structures used like Available, Max, Allocation, and Need matrices to track current resources. The Safety Algorithm is described to check if the system is in a safe state by finding a process that can terminate and release resources. The Resource-Request Algorithm simulates allocating resources to a process and checks if it leads to a safe state before actual allocation.
Microservices architecture involves many services that are being distributed over the network resulting in many more ways of failure. This session will try to cover the available tools that can help you when designing/building such distributed system in Go
How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...DataWorks Summit
Apache Hadoop 3.x ushers in major architectural advances such as erasure coding in HDFS, containerized workload flexibility, GPU resource pooling and a litany of other features. These enhancements help drive real benefits when combined with high-speed, high-capacity solid state drives (SSDs).
Micron is a user of Apache Hadoop as well as an innovator in next-gen IT architecture, pushing the envelope on flash storage with the latest 3D NAND. Micron labs have benchmarks showing that adding a single SSD to existing HDD-based cluster nodes can deliver 200% faster Hadoop, at a fraction of the cost of new nodes.
In this session, Micron and Hortonworks will show real-world results demonstrating the tangible benefits of Apache Hadoop 3.x combined with the latest in non-volatile storage and an updated IT infrastructure with NVMe™ solid state drives in well-designed platforms. We will explore specific workloads and application acceleration by combining Apache Hadoop 3.x with SSDs to build analytics platforms that provide a sustainable competitive advantage for many applications to deliver a combination of low latency, high-performance active archives with better results and reduced storage overhead.
Speakers
Saumitra Buragohain, Hortonworks, Sr. Director, Product Management
Mike Cunliffe, Micron Technology, Data Management Architect
Trino (formerly known as PrestoSQL) is an open source distributed SQL query engine for running fast analytical queries against data sources of all sizes. Some key updates since being rebranded from PrestoSQL to Trino include new security features, language features like window functions and temporal types, performance improvements through dynamic filtering and partition pruning, and new connectors. Upcoming improvements include support for MERGE statements, MATCH_RECOGNIZE patterns, and materialized view enhancements.
Transformer modality is an established architecture in natural language processing that utilizes a framework of self-attention with a deep learning approach.
This presentation was delivered under the mentorship of Mr. Mukunthan Tharmakulasingam (University of Surrey, UK), as a part of the ScholarX program from Sustainable Education Foundation.
Timelines at Scale (Raffi Krikorian - VP of Engineering at Twitter)Chris Bolman
Presentation by Raffi Krikorian, VP of Engineering at Twitter, on scaling Twitter to over 150 million active users with redis and other architectural approaches
From Word Embeddings To Document Distances
We present the Word Mover’s Distance (WMD), a novel distance function between text documents. Our work is based on recent results in word embeddings that learn semantically meaningful representations for words from local cooccurrences in sentences. The WMD distance measures the dissimilarity between two text documents as the minimum amount of distance that the embedded words of one document need to “travel” to reach the embedded words of another document. We show that this distance metric can be cast as an instance of the Earth Mover’s Distance, a well studied transportation problem for which several highly efficient solvers have been developed. Our metric has no hyperparameters and is straight-forward to implement. Further, we demonstrate on eight real world document classification data sets, in comparison with seven state-of-the-art baselines, that the WMD metric leads to unprecedented low k-nearest neighbor document classification error rates.
This document discusses periodic functions and Fourier series. A periodic function repeats its values over regular intervals called periods. The Fourier series represents periodic functions as the sum of trigonometric functions (sines and cosines) with different frequencies. The document derives the formulas to calculate the coefficients of the Fourier series from a given periodic function. It involves integrating the function multiplied by sines and cosines over one period of the function.
This document summarizes the Banker's Algorithm, which is used to determine if a set of pending processes can safely acquire resources or if they should wait due to limited resources. It outlines the key data structures used like Available, Max, Allocation, and Need matrices to track current resources. The Safety Algorithm is described to check if the system is in a safe state by finding a process that can terminate and release resources. The Resource-Request Algorithm simulates allocating resources to a process and checks if it leads to a safe state before actual allocation.
Microservices architecture involves many services that are being distributed over the network resulting in many more ways of failure. This session will try to cover the available tools that can help you when designing/building such distributed system in Go
How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...DataWorks Summit
Apache Hadoop 3.x ushers in major architectural advances such as erasure coding in HDFS, containerized workload flexibility, GPU resource pooling and a litany of other features. These enhancements help drive real benefits when combined with high-speed, high-capacity solid state drives (SSDs).
Micron is a user of Apache Hadoop as well as an innovator in next-gen IT architecture, pushing the envelope on flash storage with the latest 3D NAND. Micron labs have benchmarks showing that adding a single SSD to existing HDD-based cluster nodes can deliver 200% faster Hadoop, at a fraction of the cost of new nodes.
In this session, Micron and Hortonworks will show real-world results demonstrating the tangible benefits of Apache Hadoop 3.x combined with the latest in non-volatile storage and an updated IT infrastructure with NVMe™ solid state drives in well-designed platforms. We will explore specific workloads and application acceleration by combining Apache Hadoop 3.x with SSDs to build analytics platforms that provide a sustainable competitive advantage for many applications to deliver a combination of low latency, high-performance active archives with better results and reduced storage overhead.
Speakers
Saumitra Buragohain, Hortonworks, Sr. Director, Product Management
Mike Cunliffe, Micron Technology, Data Management Architect
Trino (formerly known as PrestoSQL) is an open source distributed SQL query engine for running fast analytical queries against data sources of all sizes. Some key updates since being rebranded from PrestoSQL to Trino include new security features, language features like window functions and temporal types, performance improvements through dynamic filtering and partition pruning, and new connectors. Upcoming improvements include support for MERGE statements, MATCH_RECOGNIZE patterns, and materialized view enhancements.
Transformer modality is an established architecture in natural language processing that utilizes a framework of self-attention with a deep learning approach.
This presentation was delivered under the mentorship of Mr. Mukunthan Tharmakulasingam (University of Surrey, UK), as a part of the ScholarX program from Sustainable Education Foundation.
Timelines at Scale (Raffi Krikorian - VP of Engineering at Twitter)Chris Bolman
Presentation by Raffi Krikorian, VP of Engineering at Twitter, on scaling Twitter to over 150 million active users with redis and other architectural approaches
Avaliable at: https://github.com/dbsmasters/bdsmasters
The current project is implemented in the context of the course "Big Data Management Systems" taught by Prof. Chatziantoniou in the Department of Management Science and Technology (AUEB). The aim of the project is to familiarize the students with big data management systems such as Hadoop, Redis, MongoDB and Azure Stream Analytics.
The branch-and-bound method is used to solve optimization problems by systematically evaluating potential solutions through traversing a state space tree. It improves on backtracking by not limiting the traversal order and using bounds to prune unpromising nodes. For the traveling salesperson problem, an initial tour provides an upper bound, local information at each node gives a lower bound, and nodes are expanded in best-first order until an optimal tour is found or proven impossible.
Introducing Apache Giraph for Large Scale Graph Processingsscdotopen
This document introduces Apache Giraph, an open source implementation of Google's Pregel framework for large scale graph processing. Giraph allows for distributed graph computation using the bulk synchronous parallel (BSP) model. Key points:
- Giraph uses the vertex-centric programming model where computation is defined in terms of messages passed between vertices.
- It runs on Hadoop and uses its master-slave architecture, with the master coordinating workers that hold vertex partitions.
- PageRank is given as a example algorithm, where each vertex computes its rank based on messages from neighbors in each superstep until convergence.
- Giraph handles fault tolerance, uses ZooKeeper for coordination, and allows graph algorithms
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterDataWorks Summit
- Profiling Hadoop jobs at Twitter revealed that compression/decompression of intermediate data and deserialization of complex object keys were very expensive. Optimizing these led to performance improvements of 1.5x or more.
- Using columnar file formats like Apache Parquet allows reading only needed columns, avoiding deserialization of unused data. This led to gains of up to 3x.
- Scala macros were developed to generate optimized implementations of Hadoop's RawComparator for common data types, avoiding deserialization for sorting.
Learn how Cloudera Impala empowers you to:
- Perform interactive, real-time analysis directly on source data stored in Hadoop
- Interact with data in HDFS and HBase at the “speed of thought”
- Reduce data movement between systems & eliminate double storage
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2L4rPmM
This CloudxLab Basics of RDD tutorial helps you to understand Basics of RDD in detail. Below are the topics covered in this tutorial:
1) What is RDD - Resilient Distributed Datasets
2) Creating RDD in Scala
3) RDD Operations - Transformations & Actions
4) RDD Transformations - map() & filter()
5) RDD Actions - take() & saveAsTextFile()
6) Lazy Evaluation & Instant Evaluation
7) Lineage Graph
8) flatMap and Union
9) Scala Transformations - Union
10) Scala Actions - saveAsTextFile(), collect(), take() and count()
11) More Actions - reduce()
12) Can We Use reduce() for Computing Average?
13) Solving Problems with Spark
14) Compute Average and Standard Deviation with Spark
15) Pick Random Samples From a Dataset using Spark
Mastering the game of go with deep neural networks and tree searchSanFengChang
AlphaGo uses a combination of deep neural networks and tree search to master the game of Go. It has two neural networks - a policy network that selects moves and a value network that evaluates board positions. The policy network is trained by human expert data and reinforcement learning from self-play. During games, AlphaGo uses Monte Carlo tree search guided by the neural networks to select moves. AlphaGo defeated professional Go players due to this powerful combination of techniques, demonstrating superhuman playing strength at Go.
ETL in tf.data
ImageDataGenerator (Keras) VS tf.data
tf.function , XLA, mixed precision and snapshot in TF
video[Persian]: https://www.aparat.com/v/HGvC2
Code and materials: https://github.com/Alireza-Akhavan/class.vision/tree/master/tf2
خرید دوره:
http://class.vision/deep-face-recognition/
one-shot learning and Face Verification Recognition
Siamese network
Discriminative Feature
Facenet paper and face embedding
metric learning for face: triplet loss, center loss, sphereface, arcface & amsoftmax
face detection and landmark detection
full face recognition pipeline
https://github.com/Alireza-Akhavan/deep-face-recognition/
Thursday, August 30, 2018
دوره مقدماتی یادگیری ژرف
مقدمات یادگیری ماشین و یادگیری نظارت شده
آشنایی با نوع داده تصویر در پایتون
پیده سازی طبقه بندی ساده در پایتون
در این جلسه با مفاهیم پایه شبکه های عصبی و یادگیری عمیق آشنا شدیم. نوع خاصی از داده که داده تصویری است معرفی شد. و سپس به معرفی و بررسی انواع توابع فعال سازی ( activation function) معرفی مدل های پرسپترون چندلایه (MLP یا multi-layer Perceptron) انواع توابع هزینه در شبکه های عصبی (Cost Functions یا Loss Function) یادگیری و آموزش شبکه های عصبی پرداخته شد.
همچنین به بررسی tensorflow playground پرداخته و مباحثی مانند تعداد لایه و نورون و نوع activation function و تاثیر آن بر روی آموزش مورد بحث قرار گرفت. در ادامه آموزش کتابخانه Keras در پایتون و پیاده سازی یک شبکه عصبی ساده بیان شد. مشکل over-fitting و راهکار Dropout به عنوان یکی از راه حل های regularization بیان شد.
A Comparative Study of varying parameters in invariant object recognition at ...Alireza AkhavanPour
در این تحقیق، پس از تهیه ی دیتاستی شامل تغییرات مختلف،
تاثیر هر یک از پارامترهای تصویر در بازشناسی اشیاء در انسان و مدل محاسباتی عمیق الکسنت بررسی شد
بازشناسی اشیاء سریع در انسان
قشر بینایی مغز و مسیر قدامی
تغییرات اشیاء بازشناسی نامتغیر اشیاء
بازنمایی خوب و بد
مدلهای محاسباتی
شبکه عصبی کانولوشنی
Avaliable at: https://github.com/dbsmasters/bdsmasters
The current project is implemented in the context of the course "Big Data Management Systems" taught by Prof. Chatziantoniou in the Department of Management Science and Technology (AUEB). The aim of the project is to familiarize the students with big data management systems such as Hadoop, Redis, MongoDB and Azure Stream Analytics.
The branch-and-bound method is used to solve optimization problems by systematically evaluating potential solutions through traversing a state space tree. It improves on backtracking by not limiting the traversal order and using bounds to prune unpromising nodes. For the traveling salesperson problem, an initial tour provides an upper bound, local information at each node gives a lower bound, and nodes are expanded in best-first order until an optimal tour is found or proven impossible.
Introducing Apache Giraph for Large Scale Graph Processingsscdotopen
This document introduces Apache Giraph, an open source implementation of Google's Pregel framework for large scale graph processing. Giraph allows for distributed graph computation using the bulk synchronous parallel (BSP) model. Key points:
- Giraph uses the vertex-centric programming model where computation is defined in terms of messages passed between vertices.
- It runs on Hadoop and uses its master-slave architecture, with the master coordinating workers that hold vertex partitions.
- PageRank is given as a example algorithm, where each vertex computes its rank based on messages from neighbors in each superstep until convergence.
- Giraph handles fault tolerance, uses ZooKeeper for coordination, and allows graph algorithms
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterDataWorks Summit
- Profiling Hadoop jobs at Twitter revealed that compression/decompression of intermediate data and deserialization of complex object keys were very expensive. Optimizing these led to performance improvements of 1.5x or more.
- Using columnar file formats like Apache Parquet allows reading only needed columns, avoiding deserialization of unused data. This led to gains of up to 3x.
- Scala macros were developed to generate optimized implementations of Hadoop's RawComparator for common data types, avoiding deserialization for sorting.
Learn how Cloudera Impala empowers you to:
- Perform interactive, real-time analysis directly on source data stored in Hadoop
- Interact with data in HDFS and HBase at the “speed of thought”
- Reduce data movement between systems & eliminate double storage
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2L4rPmM
This CloudxLab Basics of RDD tutorial helps you to understand Basics of RDD in detail. Below are the topics covered in this tutorial:
1) What is RDD - Resilient Distributed Datasets
2) Creating RDD in Scala
3) RDD Operations - Transformations & Actions
4) RDD Transformations - map() & filter()
5) RDD Actions - take() & saveAsTextFile()
6) Lazy Evaluation & Instant Evaluation
7) Lineage Graph
8) flatMap and Union
9) Scala Transformations - Union
10) Scala Actions - saveAsTextFile(), collect(), take() and count()
11) More Actions - reduce()
12) Can We Use reduce() for Computing Average?
13) Solving Problems with Spark
14) Compute Average and Standard Deviation with Spark
15) Pick Random Samples From a Dataset using Spark
Mastering the game of go with deep neural networks and tree searchSanFengChang
AlphaGo uses a combination of deep neural networks and tree search to master the game of Go. It has two neural networks - a policy network that selects moves and a value network that evaluates board positions. The policy network is trained by human expert data and reinforcement learning from self-play. During games, AlphaGo uses Monte Carlo tree search guided by the neural networks to select moves. AlphaGo defeated professional Go players due to this powerful combination of techniques, demonstrating superhuman playing strength at Go.
ETL in tf.data
ImageDataGenerator (Keras) VS tf.data
tf.function , XLA, mixed precision and snapshot in TF
video[Persian]: https://www.aparat.com/v/HGvC2
Code and materials: https://github.com/Alireza-Akhavan/class.vision/tree/master/tf2
خرید دوره:
http://class.vision/deep-face-recognition/
one-shot learning and Face Verification Recognition
Siamese network
Discriminative Feature
Facenet paper and face embedding
metric learning for face: triplet loss, center loss, sphereface, arcface & amsoftmax
face detection and landmark detection
full face recognition pipeline
https://github.com/Alireza-Akhavan/deep-face-recognition/
Thursday, August 30, 2018
دوره مقدماتی یادگیری ژرف
مقدمات یادگیری ماشین و یادگیری نظارت شده
آشنایی با نوع داده تصویر در پایتون
پیده سازی طبقه بندی ساده در پایتون
در این جلسه با مفاهیم پایه شبکه های عصبی و یادگیری عمیق آشنا شدیم. نوع خاصی از داده که داده تصویری است معرفی شد. و سپس به معرفی و بررسی انواع توابع فعال سازی ( activation function) معرفی مدل های پرسپترون چندلایه (MLP یا multi-layer Perceptron) انواع توابع هزینه در شبکه های عصبی (Cost Functions یا Loss Function) یادگیری و آموزش شبکه های عصبی پرداخته شد.
همچنین به بررسی tensorflow playground پرداخته و مباحثی مانند تعداد لایه و نورون و نوع activation function و تاثیر آن بر روی آموزش مورد بحث قرار گرفت. در ادامه آموزش کتابخانه Keras در پایتون و پیاده سازی یک شبکه عصبی ساده بیان شد. مشکل over-fitting و راهکار Dropout به عنوان یکی از راه حل های regularization بیان شد.
A Comparative Study of varying parameters in invariant object recognition at ...Alireza AkhavanPour
در این تحقیق، پس از تهیه ی دیتاستی شامل تغییرات مختلف،
تاثیر هر یک از پارامترهای تصویر در بازشناسی اشیاء در انسان و مدل محاسباتی عمیق الکسنت بررسی شد
بازشناسی اشیاء سریع در انسان
قشر بینایی مغز و مسیر قدامی
تغییرات اشیاء بازشناسی نامتغیر اشیاء
بازنمایی خوب و بد
مدلهای محاسباتی
شبکه عصبی کانولوشنی