Terabyte-scale image similarity search: experience and best practiceDenis Shestakov
Slides for the talk given at IEEE BigData 2013, Santa Clara, USA on 07.10.2013. Full-text paper is available at http://goo.gl/WTJoxm
To cite please refer to http://dx.doi.org/10.1109/BigData.2013.6691637
Terark (Y Combinator W17) has built a new storage engine based on nested succinct trie which provides a 10x-500x performance improvement, a 10:1 compression ratio and a crazy low latency compared to Google's LevelDB, Facebook's RocksDB. It is usable as a standalone key-value store, or as a storage engine for MySQL and MongoDB.
Terabyte-scale image similarity search: experience and best practiceDenis Shestakov
Slides for the talk given at IEEE BigData 2013, Santa Clara, USA on 07.10.2013. Full-text paper is available at http://goo.gl/WTJoxm
To cite please refer to http://dx.doi.org/10.1109/BigData.2013.6691637
Terark (Y Combinator W17) has built a new storage engine based on nested succinct trie which provides a 10x-500x performance improvement, a 10:1 compression ratio and a crazy low latency compared to Google's LevelDB, Facebook's RocksDB. It is usable as a standalone key-value store, or as a storage engine for MySQL and MongoDB.
Apache Tajo on Swift: Bringing SQL to the OpenStack WorldJihoon Son
This slide was presented at the SK Telecom T Developer Forum. It contains the brief evaluation results of the query execution performance of Tajo on Swift.
I conducted two kinds of experiments; The first experiment was to compare the performance of Tajo with on another distributed storage, i.e., HDFS. And the second experiment was the scalability test of Swift.
Interestingly, the scan performance on Swift is slower more than two times than that on HDFS. In addition, the task scheduling time on Swift is much greater than that on HDFS, which means the query initialization cost is very high.
HBaseCon 2015: Solving HBase Performance Problems with Apache HTraceHBaseCon
HTrace is a new Apache incubator project which makes it much easier to diagnose and detect performance problems in HBase. It provides a unified view of the performance of requests, following them from their origin in the HBase client, through the HBase region servers, and finally into HDFS. System administrators can use a central web interface to query and view aggregate performance information for the whole cluster. This talk will cover the motivations for creating HTrace, its design, and some examples of how HTrace can help diagnose real-world HBase problems.
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017StampedeCon
This talk will go over how to build an end-to-end data processing system in Python, from data ingest, to data analytics, to machine learning, to user presentation. Developments in old and new tools have made this particularly possible today. The talk in particular will talk about Airflow for process workflows, PySpark for data processing, Python data science libraries for machine learning and advanced analytics, and building agile microservices in Python.
System architects, software engineers, data scientists, and business leaders can all benefit from attending the talk. They should learn how to build more agile data processing systems and take away some ideas on how their data systems could be simpler and more powerful.
RubiX: A caching framework for big data engines in the cloud. Helps provide data caching capabilities to engines like Presto, Spark, Hadoop, etc transparently without user intervention.
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...MLconf
Scripts that Scale with F# and mbrace.io:
Nothing beats interactive scripting for productive data exploration and rapid prototyping: grab data, run code, and iterate based on feedback. However, that story starts to break down once you need to process large datasets or expensive computations. Your local machine becomes the bottleneck, and your are left with a slow and unresponsive environment.
In this talk, we will demonstrate on live examples how you can have your cake and eat it, too, using mbrace.io, a free, open-source engine for scalable cloud programming. Using a simple programming model, you can keep working from your favorite scripting environment, and execute code interactively against a cluster on the Azure cloud. We will discuss the relevance of F# and mbrace in a data science and machine learning context, from parallelizing code and data processing in a functional style, to leveraging F# type providers to consume data or even run R packages.
Xephon K is a time series database using Cassandra as main backend. We talk about how to model time series data in Cassandra and compare its throughput with InfluxDB and KairosDB
Apache Tajo on Swift: Bringing SQL to the OpenStack WorldJihoon Son
This slide was presented at the SK Telecom T Developer Forum. It contains the brief evaluation results of the query execution performance of Tajo on Swift.
I conducted two kinds of experiments; The first experiment was to compare the performance of Tajo with on another distributed storage, i.e., HDFS. And the second experiment was the scalability test of Swift.
Interestingly, the scan performance on Swift is slower more than two times than that on HDFS. In addition, the task scheduling time on Swift is much greater than that on HDFS, which means the query initialization cost is very high.
HBaseCon 2015: Solving HBase Performance Problems with Apache HTraceHBaseCon
HTrace is a new Apache incubator project which makes it much easier to diagnose and detect performance problems in HBase. It provides a unified view of the performance of requests, following them from their origin in the HBase client, through the HBase region servers, and finally into HDFS. System administrators can use a central web interface to query and view aggregate performance information for the whole cluster. This talk will cover the motivations for creating HTrace, its design, and some examples of how HTrace can help diagnose real-world HBase problems.
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017StampedeCon
This talk will go over how to build an end-to-end data processing system in Python, from data ingest, to data analytics, to machine learning, to user presentation. Developments in old and new tools have made this particularly possible today. The talk in particular will talk about Airflow for process workflows, PySpark for data processing, Python data science libraries for machine learning and advanced analytics, and building agile microservices in Python.
System architects, software engineers, data scientists, and business leaders can all benefit from attending the talk. They should learn how to build more agile data processing systems and take away some ideas on how their data systems could be simpler and more powerful.
RubiX: A caching framework for big data engines in the cloud. Helps provide data caching capabilities to engines like Presto, Spark, Hadoop, etc transparently without user intervention.
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...MLconf
Scripts that Scale with F# and mbrace.io:
Nothing beats interactive scripting for productive data exploration and rapid prototyping: grab data, run code, and iterate based on feedback. However, that story starts to break down once you need to process large datasets or expensive computations. Your local machine becomes the bottleneck, and your are left with a slow and unresponsive environment.
In this talk, we will demonstrate on live examples how you can have your cake and eat it, too, using mbrace.io, a free, open-source engine for scalable cloud programming. Using a simple programming model, you can keep working from your favorite scripting environment, and execute code interactively against a cluster on the Azure cloud. We will discuss the relevance of F# and mbrace in a data science and machine learning context, from parallelizing code and data processing in a functional style, to leveraging F# type providers to consume data or even run R packages.
Xephon K is a time series database using Cassandra as main backend. We talk about how to model time series data in Cassandra and compare its throughput with InfluxDB and KairosDB
TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...Edureka!
This Edureka TensorFlow Tutorial (Blog: https://goo.gl/HTE7uB) will help you in understanding various important basics of TensorFlow. It also includes a use-case in which we will create a model that will differentiate between a rock and a mine using TensorFlow. Below are the topics covered in this tutorial:
1. What are Tensors?
2. What is TensorFlow?
3. TensorFlow Code-basics
4. Graph Visualization
5. TensorFlow Data structures
6. Use-Case Naval Mine Identifier (NMI)
TonY: Native support of TensorFlow on HadoopAnthony Hsu
Anthony Hsu, Jonathan Hung, and Keqiu Hu offer an overview of TensorFlow on YARN (TonY), a framework to natively run TensorFlow on Hadoop. TonY enables running TensorFlow distributed training as a new type of Hadoop application. Its native Hadoop connector, together with other features, aims to run TensorFlow jobs as reliably and flexibly as other applications on Hadoop.
Video: https://youtu.be/sIfnsU-5jHM
Container orchestration: the cold war - Giulio De Donato - Codemotion Rome 2017Codemotion
L’ecosistema degli orchestratori di container è in rapido movimento, una galassia di piattaforme e framework. Come si fa a scegliere quello giusto per le vostre esigenze? Vediamo tutti gli orchestratori in commercio, con i loro pro e contro: DC/OS, Kubernetes, Docker e anche quelli meno famosi ma saranno promesse, e anche le dinamiche e le scelte fatte.
High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017Chris Fregly
http://pipeline.io
Title
PipelineAI Distributed Spark ML + Tensorflow AI + GPU Workshop
*A GPU-based cloud instance will be provided to each attendee as part of this event
Highlights
We will each build an end-to-end, continuous Tensorflow AI model training and deployment pipeline on our own GPU-based cloud instance.
At the end, we will combine our cloud instances to create the LARGEST Distributed Tensorflow AI Training and Serving Cluster in the WORLD!
Pre-requisites
Just a modern browser, internet connection, and a good night's sleep! We'll provide the rest.
Agenda
Spark ML
TensorFlow AI
Storing and Serving Models with HDFS
Trade-offs of CPU vs. *GPU, Scale Up vs. Scale Out
CUDA + cuDNN GPU Development Overview
TensorFlow Model Checkpointing, Saving, Exporting, and Importing
Distributed TensorFlow AI Model Training (Distributed Tensorflow)
TensorFlow's Accelerated Linear Algebra Framework (XLA)
TensorFlow's Just-in-Time (JIT) Compiler, Ahead of Time (AOT) Compiler
Centralized Logging and Visualizing of Distributed TensorFlow Training (Tensorboard)
Distributed Tensorflow AI Model Serving/Predicting (TensorFlow Serving)
Centralized Logging and Metrics Collection (Prometheus, Grafana)
Continuous TensorFlow AI Model Deployment (TensorFlow, Airflow)
Hybrid Cross-Cloud and On-Premise Deployments (Kubernetes)
High-Performance and Fault-Tolerant Micro-services (NetflixOSS)
Bio
Chris Fregly is Founder and Research Engineer at PipelineIO, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production."
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
Github Repo
https://github.com/fluxcapacitor/pipeline
Video
https://youtu.be/oNf3I1fVmg8
이 발표에서는 TensorFlow의 지난 1년을 간단하게 돌아보고, TensorFlow의 차기 로드맵에 따라 개발 및 도입될 예정인 여러 기능들을 소개합니다. 또한 2017년 및 2018년의 머신러닝 프레임워크 개발 트렌드와 방향에 대한 이야기도 함께 합니다.
In this talk, I look back the TensorFlow development over the past year. Then discusses the overall development direction of machine learning frameworks, with an introduction to features that will be added to TensorFlow later on.
You can write the best, most structured documentation in the world - and your users will still arrive by some other route. This session focuses on the GitHub repos that your documentation references, and how to prepare for these to be the entry point for someone.
TechSEO Boost 2018: Implementing Hreflang on Legacy Tech Stacks Using Service...Catalyst
One of the challenges faced at enterprise SEO level is often the legacy platforms and tech stacks that you inherit. Finding a cost-effective way of implementing international SEO best practice is often a barrier to internationalisation. Edge technology is creating new opportunities to optimise websites independently of the inherited technological barriers. In this session, SALT.agency’s Dan Taylor will explore their findings from implementing Hreflang using cutting edge technology to remove these barriers.
Overview of TensorFlow For Natural Language Processingananth
TensorFlow open sourced recently by Google is one of the key frameworks that support development of deep learning architectures. In this slideset, part 1, we get started with a few basic primitives of TensorFlow. We will also discuss when and when not to use TensorFlow.
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark ClustersDataWorks Summit
In recent releases, TensorFlow has been enhanced for distributed learning and HDFS access. Outside of the Google cloud, however, users still needed a dedicated cluster for TensorFlow applications. There are several community projects wiring TensorFlow onto Apache Spark clusters. Unfortunately, they are limited to support synchronous distributed learning only, and don’t allow TensorFlow servers to communicate with each other directly.
In this talk, we will introduce a new framework, TensorFlowOnSpark, for scalable TensorFlow learning, which will be open sourced in Q1 2017. This new framework enables easy experimentation for algorithm designs, and supports scalable training & inferencing on Spark clusters. It supports all TensorFlow functionalities including synchronous & asynchronous learning, model & data parallelism, and TensorBoard. It provides architectural flexibility for data ingestion to TensorFlow and network protocols for server-to-server communication. With a few lines of code changes, an existing TensorFlow algorithm can be transformed into a scalable application.
High Performance TensorFlow in Production -- Sydney ML / AI Train Workshop @ ...Chris Fregly
http://pipeline.ai
Title
PipelineAI Distributed Spark ML + Tensorflow AI + GPU Workshop
*A GPU-based cloud instance will be provided to each attendee as part of this event
Highlights
We will each build an end-to-end, continuous Tensorflow AI model training and deployment pipeline on our own GPU-based cloud instance.
At the end, we will combine our cloud instances to create the LARGEST Distributed Tensorflow AI Training and Serving Cluster in the WORLD!
Agenda
Spark ML
Tensorflow AI
Storing and Serving Models with HDFS
Trade-offs of CPU vs. *GPU, Scale Up vs. Scale Out
CUDA + cuDNN GPU Development Overview
Tensorflow Model Checkpointing, Saving, Exporting, and Importing
Distributed Tensorflow AI Model Training (Distributed Tensorflow)
Centralized Logging and Visualizing of Distributed Tensorflow Training (Tensorboard)
Distributed Tensorflow AI Model Serving/Predicting (Tensorflow Serving)
Centralized Logging and Metrics Collection (Prometheus, Grafana)
Continuous Tensorflow AI Model Deployment (Tensorflow, Airflow)
Hybrid Cross-Cloud and On-Premise Deployments (Kubernetes)
High-Performance and Fault-Tolerant Microsservices using Request Batching and Circuit Breakers (NetflixOSS)
Github Repo
https://github.com/fluxcapacitor/pipeline
Similar to Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & Keras (20)
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...Taegyun Jeon
PR-050: Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting
Original Slide from http://home.cse.ust.hk/~xshiab/data/valse-20160323.pptx
Youtube: https://youtu.be/3cFfCM4CXws
[PR12] PR-026: Notes for CVPR Machine Learning SessionsTaegyun Jeon
PR-026: Notes for CVPR Machine Learning Session
Paper 1: Borrowing Treasures From the Wealthy: Deep Transfer Learning Through Selective Joint Fine-Tuning, https://arxiv.org/abs/1702.08690
Paper 2: The More You Know: Using Knowledge Graphs for Image Classification, https://arxiv.org/abs/1612.04844
Paper 3: On Compressing Deep Models by Low Rank and Sparse Decomposition, http://openaccess.thecvf.com/content_cvpr_2017/papers/Yu_On_Compressing_Deep_CVPR_2017_paper.pdf
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & Keras
1. Taegyun Jeon
TensorFlow Dev Summit Extended Seoul / 2017.02.22
R&D Center, Satrec Initiative
TensorFlow:
TensorBoard & Keras
GDG Meetup in February
2. Contents
Integrating Keras & TensorFlow: The Keras Workflow,
Expanded
▫ Speaker: Francois Chollet
Hands-on TensorBoard
▫ Speaker: Dandelion Mané
▫ Code & Slide: https://goo.gl/San2uR
All contents are provided from TensorFlow Dev Summit 2017
(https://events.withgoogle.com/tensorflow-dev-summit/)
Page 2[TensorFlow Dev Summit Extended] TensorFlow: TensorBoard & Keras
3. Keras
An API spec for building deep learning models across many
platforms
[TensorFlow Dev Summit Extended] TensorFlow: TensorBoard & Keras Page 3
4. Keras의 인기 비결?
간편함
간결함
TF의 잦은 API 변경
연구자와 개발자의
의사소통 도구
[TensorFlow Dev Summit Extended] TensorFlow: TensorBoard & Keras Page 4
5. tf.keras
Keras는 결국 TensorFlow의 품으로!
TensorFlow layers = Keras layers
Keras Model
▫ Sequence model and functional model API
TensorFlow 기능과 통합 가능
[TensorFlow Dev Summit Extended] TensorFlow: TensorBoard & Keras Page 5
https://www.tensorflow.org/api_docs/python/tf/contrib/learn/Experiment
6. Keras 사용자들에게 어떤 일들이?
TensorFlow와 Keras사이에서 고민하지 않으셔도 됩니다.
TF와 Keras의 장단점을 파악해서 섞어서 쓰세요.
Distributed Training, Cloud ML, Hyperparameter setting,
TF-Serving
[TensorFlow Dev Summit Extended] TensorFlow: TensorBoard & Keras Page 6
https://blog.keras.io/keras-as-a-simplified-interface-to-tensorflow-tutorial.html
7. Keras Example: Video QA
[TensorFlow Dev Summit Extended] TensorFlow: TensorBoard & Keras Page 7
8. Keras Example: Video QA
[TensorFlow Dev Summit Extended] TensorFlow: TensorBoard & Keras Page 8
9. Keras Example: Video QA
[TensorFlow Dev Summit Extended] TensorFlow: TensorBoard & Keras Page 9
10. Keras Example: Video QA
[TensorFlow Dev Summit Extended] TensorFlow: TensorBoard & Keras Page 10
11. Keras Example: Video QA
[TensorFlow Dev Summit Extended] TensorFlow: TensorBoard & Keras Page 11
12. Keras Example: Video QA
[TensorFlow Dev Summit Extended] TensorFlow: TensorBoard & Keras Page 12
14. Keras Example: Video QA
[TensorFlow Dev Summit Extended] TensorFlow: TensorBoard & Keras Page 14
15. Keras Example: Visual VQA
[TensorFlow Dev Summit Extended] TensorFlow: TensorBoard & Keras Page 15
16. More Examples
Keras Blog
▫ https://blog.keras.io/
Keras API (keras.applications)
▫ https://keras.io/applications/
Keras Example Directories
▫ https://github.com/fchollet/keras/tree/master/examples
[TensorFlow Dev Summit Extended] TensorFlow: TensorBoard & Keras Page 16
17. Keras: Summary
TF 사용자: 손쉬운 high-level api 사용 가능
Keras 사용자: 모델만 keras로 만들고 이후 운영은 TF로 가능
버전 변경 계획
▫ tf.contrib.keras (TF 1.1 / 올해 3월 중)
▫ tf.keras (TF 1.2)
[TensorFlow Dev Summit Extended] TensorFlow: TensorBoard & Keras Page 17
18. TensorBoard
TensorBoard를 이용해서 얻는 장점
▫ Debug: 내부를 알기 힘든 딥러닝 모델을 눈으로 확인
▫ Hyperparameter Tuning
▫ Visualize inference results
[TensorFlow Dev Summit Extended] TensorFlow: TensorBoard & Keras Page 18
46. TensorBoard: MNIST Example
[TensorFlow Dev Summit Extended] TensorFlow: TensorBoard & Keras Page 46
Hyperparameter를 찾을땐
epoch을 조금만 설정해서
초기 성능만 보고
ckpt 관리 및 추가학습
51. TensorBoard: Summary
Debug
▫ name_scope 정의와 tensor의 name 붙이기를 생활화
▫ Summary 기능을 활용
• Scalar, Image, Audio, Histogram
Hyperparameter search
▫ 다양한 run을 짧게 돌려보는 것을 권장
▫ 파라미터 뿐만 아니라, 모델 형태에도 적용 가능
Embedding visualization
▫ 표현하기 힘든 내용은 가시화!
[TensorFlow Dev Summit Extended] TensorFlow: TensorBoard & Keras Page 51
52. Q & A
Any Questions?
[TensorFlow Dev Summit Extended] TensorFlow: TensorBoard & Keras Page 52