Approximate Vector Search at Scale, With Application to Image Search - SciPY ...Wakana Nogami
Mercari provides an image search feature, which makes it possible for users to find similar items by image. This talk describes how we implemented similar image search over 100s of millions of images, in a way that is accurate. We will also highlight the techniques we used to keep the system efficient and update to date.
Apache Spark Streaming: Architecture and Fault ToleranceSachin Aggarwal
Agenda:
• Spark Streaming Architecture
• How different is Spark Streaming from other streaming applications
• Fault Tolerance
• Code Walk through & demo
• We will supplement theory concepts with sufficient examples
Speakers :
Paranth Thiruvengadam (Architect (STSM), Analytics Platform at IBM Labs)
Profile : https://in.linkedin.com/in/paranth-thiruvengadam-2567719
Sachin Aggarwal (Developer, Analytics Platform at IBM Labs)
Profile : https://in.linkedin.com/in/nitksachinaggarwal
Github Link: https://github.com/agsachin/spark-meetup
Approximate Vector Search at Scale, With Application to Image Search - SciPY ...Wakana Nogami
Mercari provides an image search feature, which makes it possible for users to find similar items by image. This talk describes how we implemented similar image search over 100s of millions of images, in a way that is accurate. We will also highlight the techniques we used to keep the system efficient and update to date.
Apache Spark Streaming: Architecture and Fault ToleranceSachin Aggarwal
Agenda:
• Spark Streaming Architecture
• How different is Spark Streaming from other streaming applications
• Fault Tolerance
• Code Walk through & demo
• We will supplement theory concepts with sufficient examples
Speakers :
Paranth Thiruvengadam (Architect (STSM), Analytics Platform at IBM Labs)
Profile : https://in.linkedin.com/in/paranth-thiruvengadam-2567719
Sachin Aggarwal (Developer, Analytics Platform at IBM Labs)
Profile : https://in.linkedin.com/in/nitksachinaggarwal
Github Link: https://github.com/agsachin/spark-meetup
Angular 2 is now in Beta. Reactive Programming techniques are used to write the Angular 2 framework. The same APIs and techniques are exposed to Angular 2 developers for application development.
This talk, by Geoff Filippi, will start with a brief overview of Reactive Programming. We will define and discuss the Observer Design Pattern. Observables are implemented in RxJS and are under consideration for standardization in ES2016. We will compare Observables, Promises, Events and callbacks. We will also discuss how Promises, Events and callbacks can be bridged into Observables.
Finally we will discuss how RxJS is used to implement Angular 2. We will explore how Observables are used in change detection and ngZone, Http, Async Facade and Forms. We will also discuss how to make use of RxJS and Observables in our Angular 2 applications.
download for better quality - Learn about the sequence and traverse functions
through the work of Runar Bjarnason and Paul Chiusano, authors of Functional Programming in Scala https://www.manning.com/books/functional-programming-in-scala, and others (Martin Odersky, Derek Wyatt, Adelbert Chang)
서버 개발자가 바라 본 Functional Reactive Programming with RxJava - SpringCamp2015NAVER / MusicPlatform
youtube : https://youtu.be/E_Bgv9upahI
비동기 이벤트 기반의 라이브러리로만 생각 했던 RxJava가 지금 이 시대 프로그래머에게 닥쳐 올 커다란 메시지라는 사실을 알게 된 지금. 현장에서 직접 느낀 RxJava의 본질인 Function Reactive Programming(FRP)에 대해 우리가 잘 아는 Java 이야기로 풀어 보고 ReactiveX(RxJava) 개발을 위한 서버 환경에 대한 이해와 SpringFramework, Netty에서의 RxJava를 어떻게 이용 하고 개발 했는지 공유 하고자 합니다.
How to Extend Apache Spark with Customized OptimizationsDatabricks
There are a growing set of optimization mechanisms that allow you to achieve competitive SQL performance. Spark has extension points that help third parties to add customizations and optimizations without needing these optimizations to be merged into Apache Spark. This is very powerful and helps extensibility. We have added some enhancements to the existing extension points framework to enable some fine grained control. This talk will be a deep dive at the extension points that is available in Spark today. We will also talk about the enhancements to this API that we developed to help make this API more powerful. This talk will be of benefit to developers who are looking to customize Spark in their deployments.
My talk at Lucene/Solr Revolution 2017, Las Vegas
The improved plugin system being proposed in this talk utilizes PF4J to add bundle packaging (zip/jar), plugin discovery (repositories), one-line install/upgrade and automatic version compatibility checks. Think of it as Homebrew or Apt-Get for Solr :) The hope is that this will encourage hundreds of new plugins being created and thus give Solr developers a sense of community and a new “stage” to perform on.
Practical non blocking microservices in java 8Michal Balinski
How to write application in Java 8 that do not waste resources and which can maximize effective utilization of CPU/RAM. Comparison of blocking and non-blocking approach for I/O and application services. Based on microservices implementing simple business logic in security/cryptography/payments domain. Demonstration of following aspects:
* NIO at all edges of application
* popular libraries that support NIO
* single instance scalability
* performance metrics (incl. throughput and latency)
* resources utilization
* code readability with CompletableFuture
* application maintenance and debugging
All above based on our experiences gathered during development of software platforms at Oberthur Technologies R&D Poland.
Learning log4j for Java beginners with a sample set of projects using log4j 1.2.
This was done for Level 2 computer engineering students at University of Moratuwa 2015.
I have hosted the samples in github (https://github.com/lkamal/log4j-workshop
), so that you will be able to download and try yourself.
When developing applications we have a hard time managing application state, and that is okay because managing application state is hard. We will try to make it easier using Redux.
Redux is predictable state management container for JavaScript applications that helps us manage our state while also making our state mutations predictable.
Through the presentation and code, I will show you how I solved my state problem with Redux in React application.
1) Total order sorting is another kind of sorting technique, where map output keys are sorted across all the reducers.
2) This technique uses, where you want to extract the most popular URLs from a web graph.
1) By default Mapreduce uses HashPartitioner as its Partitioner class, which partitions using a hash of the map output keys.
2) Also HashPartitioner ensures that all records with the same map output key goes to the same reducer, but it doesn’t perform total sorting of the map output keys across all the reducers.
3) For this reason only TotalOrderPartitioner class is introduced, which is by default packed with the Hadoop distribution.
1) If you want to work with Total order sorting, we need to create Partition file, and then we have to run Mapreduce job using TotalOrderPartitioner class.
2) We will create partition file, by using InputSampler class, which is used to do sampling of the whole dataset.
3) There are basically two kinds of samplers that we mostly use.
4) First one is RandomSampler, which is mainly used to pick random samples from the original dataset. And the second one is, IntervalSampler, which is mainly used to pick the sample for every R number of records. In the practical demonstration I have used RandomSampler class to pick the samples from Original dataset.
5) Once all the meaningful samples are extracted from the dataset, it will sort those keys, and pick N-1 keys from those sorted keys where N is number of reducers and it places in a Partition file which is used for Total order sorting.
1) This is an overview of Total Order Sorting, here it show how it generates the Partition file and also it shows how the Mapreduce job uses this Partition file during Total Order Sorting.
1) This is a code Sample for Total Order Sorting, in this we have specified the sampler object as RandomSample class. And we also set the Number of reducers using setNumReduceTasks().
And also we specified the Partionfile location unsing setPartionfile() of TotalOrderPartitioner class.
And at last we have used writePartitionFile() of InputSampler class for creating Partition file.
Java 기반의 오픈 소스 GIS 맵서버인 GeoServer 2.4 버젼 한국어 사용자 지침서입니다.
본 사용자 지침서는 한국정보통신산업진흥원(NIPA)이 시행한 공개SW커뮤니티 출판·번역지원 사업을 통해 번역, 출판되었습니다.
본 문서의 라이선스는 Creative-Commons-Attribution 3.0 Unported를 따릅니다.
Angular 2 is now in Beta. Reactive Programming techniques are used to write the Angular 2 framework. The same APIs and techniques are exposed to Angular 2 developers for application development.
This talk, by Geoff Filippi, will start with a brief overview of Reactive Programming. We will define and discuss the Observer Design Pattern. Observables are implemented in RxJS and are under consideration for standardization in ES2016. We will compare Observables, Promises, Events and callbacks. We will also discuss how Promises, Events and callbacks can be bridged into Observables.
Finally we will discuss how RxJS is used to implement Angular 2. We will explore how Observables are used in change detection and ngZone, Http, Async Facade and Forms. We will also discuss how to make use of RxJS and Observables in our Angular 2 applications.
download for better quality - Learn about the sequence and traverse functions
through the work of Runar Bjarnason and Paul Chiusano, authors of Functional Programming in Scala https://www.manning.com/books/functional-programming-in-scala, and others (Martin Odersky, Derek Wyatt, Adelbert Chang)
서버 개발자가 바라 본 Functional Reactive Programming with RxJava - SpringCamp2015NAVER / MusicPlatform
youtube : https://youtu.be/E_Bgv9upahI
비동기 이벤트 기반의 라이브러리로만 생각 했던 RxJava가 지금 이 시대 프로그래머에게 닥쳐 올 커다란 메시지라는 사실을 알게 된 지금. 현장에서 직접 느낀 RxJava의 본질인 Function Reactive Programming(FRP)에 대해 우리가 잘 아는 Java 이야기로 풀어 보고 ReactiveX(RxJava) 개발을 위한 서버 환경에 대한 이해와 SpringFramework, Netty에서의 RxJava를 어떻게 이용 하고 개발 했는지 공유 하고자 합니다.
How to Extend Apache Spark with Customized OptimizationsDatabricks
There are a growing set of optimization mechanisms that allow you to achieve competitive SQL performance. Spark has extension points that help third parties to add customizations and optimizations without needing these optimizations to be merged into Apache Spark. This is very powerful and helps extensibility. We have added some enhancements to the existing extension points framework to enable some fine grained control. This talk will be a deep dive at the extension points that is available in Spark today. We will also talk about the enhancements to this API that we developed to help make this API more powerful. This talk will be of benefit to developers who are looking to customize Spark in their deployments.
My talk at Lucene/Solr Revolution 2017, Las Vegas
The improved plugin system being proposed in this talk utilizes PF4J to add bundle packaging (zip/jar), plugin discovery (repositories), one-line install/upgrade and automatic version compatibility checks. Think of it as Homebrew or Apt-Get for Solr :) The hope is that this will encourage hundreds of new plugins being created and thus give Solr developers a sense of community and a new “stage” to perform on.
Practical non blocking microservices in java 8Michal Balinski
How to write application in Java 8 that do not waste resources and which can maximize effective utilization of CPU/RAM. Comparison of blocking and non-blocking approach for I/O and application services. Based on microservices implementing simple business logic in security/cryptography/payments domain. Demonstration of following aspects:
* NIO at all edges of application
* popular libraries that support NIO
* single instance scalability
* performance metrics (incl. throughput and latency)
* resources utilization
* code readability with CompletableFuture
* application maintenance and debugging
All above based on our experiences gathered during development of software platforms at Oberthur Technologies R&D Poland.
Learning log4j for Java beginners with a sample set of projects using log4j 1.2.
This was done for Level 2 computer engineering students at University of Moratuwa 2015.
I have hosted the samples in github (https://github.com/lkamal/log4j-workshop
), so that you will be able to download and try yourself.
When developing applications we have a hard time managing application state, and that is okay because managing application state is hard. We will try to make it easier using Redux.
Redux is predictable state management container for JavaScript applications that helps us manage our state while also making our state mutations predictable.
Through the presentation and code, I will show you how I solved my state problem with Redux in React application.
1) Total order sorting is another kind of sorting technique, where map output keys are sorted across all the reducers.
2) This technique uses, where you want to extract the most popular URLs from a web graph.
1) By default Mapreduce uses HashPartitioner as its Partitioner class, which partitions using a hash of the map output keys.
2) Also HashPartitioner ensures that all records with the same map output key goes to the same reducer, but it doesn’t perform total sorting of the map output keys across all the reducers.
3) For this reason only TotalOrderPartitioner class is introduced, which is by default packed with the Hadoop distribution.
1) If you want to work with Total order sorting, we need to create Partition file, and then we have to run Mapreduce job using TotalOrderPartitioner class.
2) We will create partition file, by using InputSampler class, which is used to do sampling of the whole dataset.
3) There are basically two kinds of samplers that we mostly use.
4) First one is RandomSampler, which is mainly used to pick random samples from the original dataset. And the second one is, IntervalSampler, which is mainly used to pick the sample for every R number of records. In the practical demonstration I have used RandomSampler class to pick the samples from Original dataset.
5) Once all the meaningful samples are extracted from the dataset, it will sort those keys, and pick N-1 keys from those sorted keys where N is number of reducers and it places in a Partition file which is used for Total order sorting.
1) This is an overview of Total Order Sorting, here it show how it generates the Partition file and also it shows how the Mapreduce job uses this Partition file during Total Order Sorting.
1) This is a code Sample for Total Order Sorting, in this we have specified the sampler object as RandomSample class. And we also set the Number of reducers using setNumReduceTasks().
And also we specified the Partionfile location unsing setPartionfile() of TotalOrderPartitioner class.
And at last we have used writePartitionFile() of InputSampler class for creating Partition file.
Java 기반의 오픈 소스 GIS 맵서버인 GeoServer 2.4 버젼 한국어 사용자 지침서입니다.
본 사용자 지침서는 한국정보통신산업진흥원(NIPA)이 시행한 공개SW커뮤니티 출판·번역지원 사업을 통해 번역, 출판되었습니다.
본 문서의 라이선스는 Creative-Commons-Attribution 3.0 Unported를 따릅니다.
Light Tutorial Python
Studybee 2주차 스터디, 가볍게 보는 파이썬!!!
가볍게 파이썬에 대해서 공부하는 시간입니다~
**http://www.studybee.kr 에서 운영하는 '초심자를 위한 웹개발' 클래스에서 만드는 교재이며,
장고를 이용해 간단하게 블로그를 만드는 것을 목표로 하고 있습니다.
7. xml 주요 class
ElementTree는 전체 XML 문서를 트리로 나타내고 Element는
이 트리에서 단일 노드를 나타냄. 전체 문서와의 상호 작용 (파
일 읽기 및 쓰기)은 일반적으로 ElementTree 수준에서 수행되고,
단일 XML 요소 및 해당 하위 요소와의 상호 작용은 요소 수준에
서 수행
7
Element
ElementTree
단순하지만 유연한 컨테이너 객체로 단순화 된 XML
정보 세트와 같은 계층 적 데이터 구조를 메모리에 저
장하도록 설계
XML 파일을 Element 객체들의 트리로 로드하고 다
시 저장하기 위한 기능를 추가
8. xml tree : ElementTree
XML 문서는 요소 트리로 구성하며, XML 트리는 루
트 요소에서 시작하여 루트 요소에서 하위 요소로 분
기, 모든 요소는 하위 요소 (하위 요소)를 가짐
8
11. Element type
계층적 데이터 구조를 메모리에 저장하도록 설
계된 유연한 컨테이너 객체
tag : 이 요소가 나타내는 데이터의 종류 (요소 유형, 즉)를 나타내는
문자열
attrib : 파이썬 사전에 저장된 다수의 속성.
text : 내용을 담을 텍스트 문자열 및 후행 텍스트를 보관할 문자열
child element : 파이썬 시퀀스에 저장된 다수의 자식 요소들
11
12. Element vs attribute
속성에는 여러 값을 포함 하지 않고, 트리 구조를 포
함하지 않으며, 쉽게 확장 할 수 없으므로 속성 생성
이 특별히 주의해야 함
12
Element로 지정 속성으로 지정
16. Must Have a Closing Tag
xml은 반드시 element에 closing tag를 가져야
함
16
17. Must be Properly Nested
xml은 반드시 element에 closing tag가 한쌍으
로 구성되어 내포되어야 함
17
Element 순서가
맞아야 함
18. Entity References
xml 내부에 text 작성 특별한 문자를 사용하도
록 제공
18
< < less than
> > greater than
& & ampersand
' ' apostrophe
" " quotation ma
rk
21. Element Naming Styles
XML Naming Styles
21
Style Example Description
Lower case <firstname> All letters lower case
Upper case <FIRSTNAME> All letters upper case
Underscore <first_name> Underscore separates words
Pascal case <FirstName> Uppercase first letter in each word
Camel case <firstName> Uppercase first letter in each word exce
pt the first
22. Solving the Name Conflict
xml tag의 이름 충돌을 방지하기 위해 prefix로
tag를 식별하기 위해 xmlns 속성을 사용
[xmlns:prefix="URI“ ]
22
25. xpath
XPath는 XSLT 표준의 주요 요소, XPath는 XML
문서의 요소와 속성을 탐색하는 데 사용
25
XPath는 XML 문서의 일부를 정의하는 구문
XPath는 경로 표현식을 사용하여 XML 문서를 탐색
XPath에는 표준 함수 라이브러리가 포함
XPath는 XSLT 및 XQuery의 주요 요소
XPath는 W3C 권장 사항
26. xpath notation 1
xpath notation
26
syntax meaning
tag(node)
지정된 태그가있는 모든 자식 요소를 선택합니다. 예를 들어, "spam"은 "spam"이
라는 이름의 모든 하위 요소를 선택하고 "spam / egg"는 "spam"이라는 이름의 모
든 하위 요소에서 "egg"라는 이름의 모든 손자를 선택합니다. 범용 이름 ( "{url}
local")을 태그로 사용할 수 있습니다.
/ Root node로 부터 선택
//
현재 요소 아래의 모든 레벨에있는 모든 하위 요소를 선택합니다 (전체 하위 트리
검색). 예를 들어 ".//egg"는 전체 트리에서 모든 "egg"요소를 선택합니다.
27. xpath notation 2
xpath notation
27
syntax meaning
.
현재 노드를 선택하십시오.
이것은 경로의 시작 부분에서 상대 경로임을 나타내기 위해 주로 유용합니다.
.. 상위 요소를 선택합니다.
*
모든 하위 요소를 선택합니다. 예를 들어 "* / egg"는 "egg"라는 이름의 모든 손자
를 선택합니다.
@ 속성을 선택
28. xpath notation 3
xpath notation
28
syntax meaning
[@attrib]
주어진 속성을 가진 모든 요소를 선택합니다. 예를 들어 ".//a[@href]"는 트리에서
"href"속성이있는 모든 "a"요소를 선택합니다.
[@attrib=’value’]
지정된 속성이 지정된 값을 가지는 모든 요소를 선택합니다. 예를 들어
".//div[@class='sidebar ']"는 클래스의 "sidebar"가있는 트리의 모든 "div"요소
를 선택합니다. 현재 릴리스에서는 값에 따옴표를 사용할 수 없습니다.
[tag]
tag라는 하위 요소가 있는 모든 요소를 선택합니다. 현재 버전에서는 태그 하나만
사용할 수 있습니다 (즉각적인 자식 만 지원됨).
[position]
(지정된 위치에 있는 모든 요소를 선택합니다. 위치는 정수 (1이 첫 번째 위치 임),
표현식 "last ()"(마지막 위치) 또는 last ()에 상대적인 위치 (예 : 두 번째 행의 "last
() - 1") 일 수 있습니다. 마지막 위치). 이 술어에는 태그 이름이 있어야 합니다.
29. xpath 처리 예시
xpath 처리 예시
29
XPath Expression Result
/bookstore/book[1] bookstore 의 첫번째 자식 book 검색
/bookstore/book[last()]
bookstore 의 마지막 자식 book 검색
/bookstore/book[last()-1] bookstore 의 마지막 -1 자식 book 검색
/bookstore/book[position()<3] bookstore 의 첫번째와 두번째 자식 book 검색
//title[@lang] title 중에 lang 속성 검색
//title[@lang='en'] title 중에 lang 속성=‘en’ 검색
/bookstore/book[price>35.00] bookstore 의 price 가격이 35.00보다 큰 자식 book 검색
/bookstore/book[price>35.00]/title bookstore 의 price 가격이 35.00보다 큰 자식 book 내의 title tag검색
72. find 메소드 특징
find/findall/findtext 메소드 특징
72
find (pattern)는 주어진 패턴과 일치하는 첫 번째 하위 요소를 반환하고, 일
치하는 요소가 없으면 None을 반환
findtext (pattern)은 주어진 패턴과 일치하는 첫 번째 하위 요소의 text 속성
값을 반환합니다. 일치하는 요소가 없으면이 메서드는 None 을 반환
findall (pattern)은 주어진 패턴과 일치하는 모든 서브 엘리먼트의리스트 (또
는 또 다른 반복 가능한 객체)를 반환
73. get 메소드 특징
getiterator/getchildren 메소드 특징
73
getiterator (tag)는 서브 트리의 모든 레벨에서 주어진 태그를 가진 모든 서
브 엘리먼트를 포함하는리스트 (또는 또 다른 반복 가능한 객체)를 리턴
요소는 문서 순서대로 반환 (즉, 트리를 XML 파일로 저장 한 경우 나타나는
순서와 동일한 순서로).
getiterator () (인수 없음)는 서브 트리에있는 모든 하위 요소의 목록 (또는 또
다른 반복 가능한 객체)을 반환
getchildren ()은 모든 직접 하위 요소의 목록 (또는 반복 가능한 다른 객체)을
반환합니다. 이 메소드는 더 이상 사용되지 않음
새로운 코드는 자식에 액세스하기 위해 인덱싱 또는 분할을 사용하거나 목록
을 가져 오기 위해 목록 (elem)을 사용
108. Element/attribute 표현차이
element는 계층적 데이터 구조를 메모리에 저장하도록
설계된 유연한 컨테이너 객체이고,
attribute은 요소 내에 정보나 데이터를 표현하기 위한
방법으로 시작태그의 일부로 표현하며 속성명과 속성 값
으로 이루어짐
108
Element
tag
attribute
text
element로 표현과 attribute으로 표현의 차이
attribute으로 설계할 경우
- 문서 크기를 줄일 수 있지만 속성 자체가
요소에 종속적이기 때문에 확장성이 떨어짐
element로 설계할 경우
- 가독성이 좋아 의미를 쉽게 파악할 수 있으며
향후 문서 내용을 추가하거나 확장할 때 용이