2. eBook, eJournal,
Paper, Patent, Judgment
Keyword
Search
Motivation for New Search Engine
Need to read
through the
document to find
the passage of
interest
3. Title in
here
Title in
here
• Identical Structure and Algorithm
• No differentiated value
• Innovative Structure and Algorithm
• Totally new search
Title in here
신개념
검색 서
비스
Keyword
Searches
Delayed
Indexing
Low
efficient
Resource
High
Cost
Resource
Back end
Pre/Post
processing
Limitations of the Keyword Search
4. Index DB Structure
Architecture of OZ Search
Memory based design Resources optimizing Indexing & searching speed
Index Structure Search and Index Algorithms
Shared
index DB
Multi-level
Hashing
Bucket
slots
Low Cost FnByte sharing
Bit type format algorithms Block Sorting
Memory
Optimizing
Word Pool
Hash index
Expansion
for key stroke
Typo Correction
Auto-completion
for every keyword
Ranking
for key strokeindex Inverted Data structure
+
OzKsana
Instant
OzBasic
Enter
OzDnS
Text block Instant
Search
OzMarker
Brand
OzAim
Big Data in
memory
Search Engines Applications Products
Similarity
Analyzer
Crony
Patent
+
.
.
.
.
.
.
5. Frequency rate
Precision rate
No Trade Off
Resource Sharing
Minimizing Duplication
Shared Index
DB
Memory based
Keyword location mngt
OzSearch
B+ tree
General Index ST
+
index & keywords
Slim
Engine
Through Shared Index structure,.
Saving resource by more than 50%
and guaranteeing memory based search for big
data
Trie
Frequency rate
Precision rate
Trade Off
Index Structure of OzSearch
6. Memory based Index DB : Employing Multi level Hash index
* How to treat collision and sort
Bucket Blocks
b~
buckets
(prime#2)
Data…
…
…
Bucket
Blocks
buckets
(Prime#1)
a~
…
Sorted data
b~ …overflow
… …
ㄱ~ …
ㄴ~ …
… …
Multi Level Hash Index(conceptual diagram)
Sort Blocks Data Set
aa~ Data(a)
ba~ Data(b)
… …
가~ …
나~ …
… …
Sorted Data Slots
Shell Sort
Data
Hash function
Slot Data Sort
Sort Data Block creation
Data: allocating corresponding block
Shell Sort
Sort blocks sequential mergingHash Collision 처리
Data Hash Function(prime #1) Bucket allocation
if Bucket(n) Overflow prime #2 Hash Function
Next level bucket(n) creation
7. Sample data set
Index
k
ko
kor
kore
korea
korean
Keyword dids
ko #1, #2
korea #1, #3
korean #4
dids Keyword
#1, #2, #3, #4 k
#1, #2, #3, #4 ko
#1, #3, #4 kor
#1, #3, #4 kore
#1, #3, #4 korea
#4 korean
#1 : ko, korea #2 : ko #3 : korea #4 : korean
OzSearch일반 전방일치 구조
Index Size
Document Volume
OzSearch
Ordinary Engines
0
10
20
30
40
50
일반구조 OzSearch
pointer 수
Did 수
일반구조 OzSearch
Index 수 6 6
Keyword 수 6 3
Did 수 18 5
Pointer 수 12 9
연산 부하 중 소
비고 - -
• small index DB Decrease OP load
speedy search
• The bigger the data size is, the more the
resource can be saved
Resource saving
Shared index Structure
8. Utilizing Low Cost Functions
1) macro : processing time measurement for every module time delay analysis
2) micro : performance check for every library / function
1) Macro analysis : google performance tool use
processing time check(CPU profiler) for every module
delayed modules logic improvement or micro analysis
Sample data: Wikipedia
2) Micro analysis : atoi() function ex.
1 bil ascii to integer conversion
atoi() function: about 30 sec
new code: within 0.3 ~ 3 sec
9. Memory data reduction technology
SNS,
Internet
……
DBMS,
File
Documents
sensor
Standard
반입 file
Standard
Input file
Bit divide
Inverted
File create
01010011
00110011
Byte
encoder
Column wise
Bit
grouping
Re-position
Code Temp
encoder
Formatter
0101
0011 x 3
acde001
defg002
fghi003
…
1,2,3,…
acde001
defg002
fghi003
…
Output
0011
0011
1100
1100
……
- Memory (Data type simplifying + Byte sharing + slimed data ST)
- Disk I/O(usage frequency Grouping + data reduction)
OP
Analysis
10. 40200
600
0
10000
20000
30000
40000
50000
일반알고리즘 OzSearch
대용량 자료 연산 알고리즘
(수배차량 조회 2.5억건/일)
Ex(1)
Wanted car surveillance CCTV data: 0.25 bil images/day
Intentional changes: 1 4, 38, 마머…
Require real time search
Minimum
Comparison
3 digit misrecognition 7C1 + 7C2 + 7C3 = 7 + 21 + 105 = 133 + right recog. 1 time = 134
Algorithm
General algorithm OzSearch Algorithm
3000image/s * 134 cases = 402000 tps 3000 images/s * 0.2s/image = 600 tps
Proof Recognition failure not counted
1. Word correction algorithm
2. Character comparison algorithm to find similar trade
mark
Operation Algorithm and inverted file
Ex(2)
KR trade mark search system
About 5 million trademarks invert file creation time
Algorithm
Current mechanism with datamining OzMarker
3.2 bil*0.00003 sec/case = 26.7 hr 5mil * 0.00003 sec/case = 150s
Invert file
creation
5 mil * (38 = 8 digit * 3 similar char) = 3.2 bil indiecs 5mil indices
11. Inverted file size & capacity comparison
20
50
30
11.7 3.2 5
100
0
20
40
60
80
100
120
예스24
(색인크기/GB)
1쇼핑몰
(색인크기/GB)
문장검색
(time/분)
호가매매
(처리용량/상대값)
경쟁사
비큐리오
31
50
4.5
0
20
40
60
Row Wise Invert File OzSearch
Big Data invert size comparison
(100만계좌 10억건 주식거래 예제)
소요 공간(GB)
32억
500만
0
100000
200000
300000
400000
기존
색인방식
OzSearch
알고리즘
Index count Comparison
(230만 유사상표 색인 자료)
index 수
402000
600
0
200000
400000
일반알고리즘 OzSearch
Big data operation Algorithm
(수배차량 조회 2.5억건/일)
tps
Index size, Searching Time comparison
(BMT results)
12. Memory Reduction Example
1) Row wise DB ST ≒ 31GB 2) Basic inverted data ST ≒ 50GB
* case/column increases, more storage space required
3) Memory Reduction data ST ≒
4.5GB
* case/column increases, efficiency
also increases
31
50
4.5
0
10
20
30
40
50
60
Row Wise C/W index Optimize
Index size (GB)
소요 공간
Name SSN ACC #
…
…
…
100만 * (20bytes + 13bytes + 20bytes) = 53MB
ACC # Designated
Code
Mass trx y/n?
…
…
…
10억 * (20bytes + 10bytes + 1bytes) = 31GB
~~~
Name SSN ACC #
…
…
…
100만 * (20bytes + 13bytes + 20bytes) = 53MB
Mass trx 0 ACC # …… ACC #ACC #
Mass trx 1 ACC # ACC #ACC # ……
ACC #
Designated
Code
…
…
…
~~~
(20bytes) * 10억 = 20GB
10억 * (20bytes + 10bytes) = 30GB
Original Data
53MB
TRX data
≒ 4.4GB
Example
In case1 billion trx from 1mil accounts at 10 thousand branches,
SSN, account #, Name, Mass trx check indexing (64 bits OS)
13. Original Data
Memory Reduction:
Data Structure and Capacity
Column SSD(Offset)
Original
Name + SSN 1M 33bytes
53MBAccount # 1M 20bytes(64bits hash indexing)
Designated # 10,000 10bytes
TRX
Mass TRX 1B 1bit 125MB
TRX 1B 20bits(1M ACC)+14bits(10,000 Designated #) 4.25GB
Memory reduction data ST and Size
Mass TRX 1B bits 0011010101000…… 110101010001101010
ACC # 1M 20 bytes …..
Designated # 10,000(10 bytes) …
…..
TRX data(10B) Abs ACC 20 bits(1M) Abs code 14 bits(10,000) ……
Analysis/OP format
53MB
125MB
4.25GB
14. Memory Reduction Developed
Improving
Learning func. For analysis
Data Type reduction Byte sep/share
Super light inverted data Structure
Standard user data definition API
DISK I/O reduction
Data Type
reduction
Column wise
compression
Server/Index
distributed/pararell processing
Essential algorithm for each part
I/O
Super light index DB structure
Map reduce / comm. tech
Memory Processing
Distribution operation
Parallel Processing
I/O
DISK I/O reduction
Reduce Communication
between servers
Big data analysis (NoSQL type)
More than hundred libraries
Big Data in Memory Technology
50% resource saving
search engine technology
Developing
Generalized Unstructured
mass data processing
Query
Optimizer
Core Technology
1) Search Engine indexing Structure and Algorithm
2) String Management related data structure and algorithm
3) Memory / resource efficiency enhancement library
4) Big Data in Memory related technology (based on Search Engine Technology)
Status Quo of BeCurio Technology
15. Product Explanation
Search
Engines
(OzSearch)
Keyword
Search
Basic
Memory/DISK based resource sharing keyword search
engine More than 50% index size
reduction
OzParser
Integrated phoneme analyzer
Just for search engine
Instant Search
Memory based
OzKsana
Real time keyword recommendation for each character
input
Real time customized ranking/indexing based on group Compare with Google instant
search
OzSniper
AND search for each character, phoneme Analyzer
Powerful spelling correction
Text block Search OzDns
Real time web based text block search
Super fast and light location data index structure
No preprocessing
Algorithms
Customized Search RERE
Real time super fast ranking for each character input
Based on keyword chain patent registration technology
Typo Correction OzFix
resource saving more than 100 times
Optimizing accuracy and flexibility
Compare with Google
Super fast search Algorithm
Auto completion and expansion for every keyword
Dramatic speed improvement and resource reduction logic
Solutions
Big Data in Memory OzAiM
Memory structure reduction data structure reduction data
structure
ANSI query (NoSql type) Analysis
Real time Group by/Order by
for more than 10B data
Similar Trade mark Search OzMarker
Based on OzFix algorithm
Super fast indexing, enhanced accuracy and flexibility
6M trade mark indexing
24 hr : 100 sec
Prior Patent Search Crony
Avoiding search formula by experts
Avoiding existing similarity analysis algorithm
1/10 resource + 10 times
faster speed
Plagiarism Checker OzSoS
Plagiarism checker based on DnS
Super fast, high accuracy real time text block similarity
analysis
BeCurio Products
16. An Example of Patent Search Formula
( (web* or internet* or network*) and (brows*) and (HTML* or HTTP* or XML* or Markup*
or javascript*) ) OR (((remote* and naviga*) or (spatial adj10 naviga*) or (arrow and (key
adj10 naviga*)) or (directional and (key adj10 naviga*)) or (user adj10 interface and
naviga*)) and (brows* or menu) ) OR ((((리모컨 or 리모콘) or (화살* and 키) or (방향 and
키) or (유저 adj10 인터페이스)) and 화면 and 선택) ) OR ( (((web) and (client* or browser*)
and ((remote* adj10 control*) or (cursor adj10 navigation) or layout) or (web* or internet*
or network*) and (brows* or navigat*) and ((remote* adj10 control*) or (user adj interfac*)
or layout*))) ) OR ( (gui or presentation) and engine and (XML* or script* or Java*) ) OR
( (web and application and framework) or (web and application and platform) or (web and
rich and internet and application) or (web and ria) or (web and ajax) or (web and
asynchronous and javascript and xml) or (widget and web) or (gadget and web) or (rss and
web) or (really adj3 simple adj3 syndication adj3 web) or (web and ((smart and client) or
(smart and agent))) or (web and downloadable adj10 application) or XAML or XUL or MXML
or (interface and element and web) ) OR ( (웹 and *플리케이션 and 프레임*) or (웹 and *플
리케이션 and 플랫폼) or RIA or AJAX or *이젝스 or *이잭스 or 아작스 or 위짓 or 위젯 or
widget or 가젯 or 가짓 or gadget or RSS or (웹 and 맞춤형정보배달) or (웹 and 스마트 and
클라이언트) or (웹 and 스마트 and 에이전트) or (웹 and 스마트 and 에이젼트) or (웹 and 다
운* and *플리케이션) or XAML or XUL or MXML ) OR ( (CE or (TV or television) or DTV or
(digital adj2 (TV or television))) and (service or (web adj10 service) or (mash adj up adj10
service)) ) OR ( (Opera or Yahoo or Konfabulator or Google or Microsoft or ANT or Mozilla or
Netscape or MacroMedia or IBM or HP).AP. )
17. Crony: New Patent Search
Keyword search Document Search
Technology
Instant
Search
Text block
Content &
Location
Prior
Patent
Search
Text block
location search
19. Crony System
Crony vs. Keyword Search
Legacy Patent Search
Keyword
Search Engine
Search formula
Keyword index
DnS Text
block Search
Patent DB
Keyword + location
Auto block separation 분리
Patent DBtext block
search index dB
Keyword
index DB
Keyword Search
result
Content + Location search
result
User
within seconds
by ordinary user
Days or weeks
Only by expert
Preprocessing Sentence/Paragraph
Hash Fn creation
Similarity
Analysis
Finger Printing
Cosθ
Time delay
Accuracy issue
Sizable index DB
Tweaked Sentence
issue
Similarity
Analysis
# of identical KWD,
Density, sequence, etc
No Pre-Processing
20. Processing role speed
Text block
search filter
Extracting the target for
refined analysis
Generic index DB search
Within a second
Extracting
thousands cases
Location data
TB filter
TB automatic separation
Identical keywords,
Distance analysis
Within a seconds
Extracting
hundreds cases
Refined
analysis
engine
Text block analysis
- Frequency, distance,
sequence
- Location calculation
Within seconds
Min. H/W
Assumption: 10 page 3M
8GB MM, more than 2CPU, PC server
Feature
Min additional job(verification DB creation,
etc) short term development and launch
Millions of
Patent data
1000 Refined analysis
engine
similarity
result
Minimize
operation
load
Rough
Text block
operation
Refined
Text block
Operation
Text block
operation
filter
DataQuantity
Text block
search Filter
1. Innovative structure by location based keyword analysis
2. Super fast and highly accurate similarity checking with 1/10 of
resource
Speed and Accuracy
21. • Similar keyword detection
for every keyword
• Easy to use
• User registration of similar
keywords
• Sg/Pl conversion
• tense conversion
• part of speech
conversion
Query Expansion
1. Detecting intentional search avoidance
2. Automatic Query expansion for similarity analysis
Typo correction Root Keyword
Search
• Intentional type
• correcting multiple words
with no spacing
• Powerful correction
algorithm
Other features
• Detect change in word
sequence
• Detect spacing change
• Detect partial change in
words
Sentense St
22. Text block Indexing 1. No pre/post processing
2. Not Using Finger Printing or other previous indexing method
3. Location data indexing for every keyword
Detecting intentional avoidance
Target data1
Prior patent detection
2
3
Instant Search Engine and others
4
Current Crony System Coverage
5
Unique functions and differentiated service
1. High speed text block search
2. Words sequence check
1. US Patent 1976~ : 3.8 million samples
2. KR Patent
1. Multiple words typo correction (better than Google)
2. Similar query expansion
3. Root word search
1. Customizable coverage and accuracy control
2. Instant search for meta data
3. Saving function for the content of interest
23. Text block chained search
By right mouse click
Chained Search of Similar Patents
24. Crony: unique text block search
Real time web based
No preprocessing
Crony vs. eTBlast by Virginia Tech
OzDnS : eTBlast
25. X. 사용자 맞춤 검색 조건 정의
Variable Role Effect
Disparity Distance b/w keywords Accuracy and number of search results
Min Text
block size
Minimum # of words in text block Control # of search results
Keyword
weight
Frequency and boundary ranking adjustment
Max Text
block size
Maximum # of words in text visual representation
Word gap Text block separation Accuracy, # of search results
Word order Keyword order accordance Accuracy
Variable Controls
26. Applicable areas with Crony Search
• Patent Search/ Judgment Search/ Plagiarism Checker/ Quotation
Search/ eBook Content Search, etc
• Smart Contact Center
• automated text message feedback for the known questions
• Script Search
• Jumping to the video frame matching to the script line
• Removing repeated questions
• By instantly showing the similar questions to the character input
• Data Mining Search
• Data Mining with Search Interface (Anyone can do mining)
• Hyper-Knowledge Product: Sharing Knowledge with No effort
27. KM system, EDMS, Document, etc
Innovative system
Personal search pattern and storing
Increase core knowledge sharing
Innovative knowledge reference / sharing
model
Super fast instant search
Real time web based text block search
Creating value added
knowledge network
by quality knowledge
acquisition and sharing
Creating Knowledge eco-system
Hyper Knowledge Creation Model
i. Knowledge branch creation model
ii. core contents chain core content sharing
iii. Knowledge eco-system by specialized category
28. Hyper Knowledge
Current Document/Knowledge
management system
• Document life cycle management
• Document search by keyword
• Knowledge registration oriented
• Low Document utilization (Too many results)
Knowledge branching and sharing
system
• Increasing knowledge utilization by increasing
knowledge sharing
• Search and share core knowledge text block
• voluntary knowledge sharing
• Knowledge search based on text block similarity search
KMS EDMS CMS ERP… etc
Life Cycle mngt. Keyword search Document based
Not able to identify the content of
interest automatically
Search knowledge
and its location
Knowledge
branch
creation
Sharing
knowledge by
specialized
categories
Individual
saving text
block of
interest
Creating high
quality
knowledge
Legacy systems
Document mngt. sys
Hyper Knowledge
System
Individual
Knowledge
branch
Shared
Knowledge
network
29. Process of Knowledge Network Creation
공유 node
report(1)
Key Paragraph-1 + wikipedia/original link
Key Paragraph-21 + PCM /original link
Key Paragraph-31 + report/ link
Input Keyword : knowledge network
Date : 2013. 2. 15
Reference Docs : wikipedia / 15 paragraph
results
attachment
wikipedia
PCM…
DnS
TBS
Wikipedia
KOI
TBS
PCM
TBS
DnS
Instant
search
report
TBS
DnS
“knowledge
network”
Key Paragraph-51 + shared node Link
Knowledge
i. Individual knowledge search and management
ii. Personal core knowledge chain creation
iii. Automatic Knowledge network creation by specialized categories
Enhancing Knowledge creation
10 core
paragraphs
10 core
paragraphs
10 core
paragraphs
Infinite Knowledge Network
(10 documents 10 core knowledge)
Theoretical knowledge combination
10 * 10 * …… * 10 = (10)10 ≒ ∞
…
…
…
Knowledge sharing
저장
Key Paragraph-21
Key Paragraph-1
Key Paragraph-31
Common Interest
30. Core Knowledge Sharing Map
Title : Knowledge block?
Author : BeCurio Research Center
sentences………………………….
……………………………………….
……………………………………….
……………………………………….
……………………………………….
……………………………………….
……………………………………….
……………………………………….
……………………………………….
……………………………………….
……………………………………….
……………………………………….
……………………………………….
………………………………………..
Key Paragraph -1 ----------------------------
-----------------------------------------------------------
Key Paragraph -2 ----------------------------
-----------------------------------------------------------
Key Paragraph -3 ---------------------------
-----------------------------------------------------------
------------------------------------
Key Paragraph-1 + wikipedia/원문 link
Key Paragraph-21 + PCM /원문 link
Key Paragraph-31 + 보고서/ link
Input Keyword : Samsung
Date : 2013. 2. 15
Reference Docs : wikipedia / 15 paragraph
질의어 + 출처 + 주요 문장 + 공
유 node Link + …
report(1)
KOI drag
…(2)
KOI drag
…(3)
KOI drag
Post Docs.
Key Paragraph-1 + wikipedia/원문 link
Key Paragraph-21 + PCM /원문 link
Key Paragraph-31 + 보고서/ link
Input Keyword : Samsung
Date : 2013. 2. 15
Reference Docs : wikipedia / 15 paragraph
질의어 + 출처 + 주요 문장 + 공
유 node Link + …
Key Paragraph-1 + wikipedia/원문 link
Key Paragraph-21 + PCM /원문 link
Key Paragraph-31 + 보고서/ link
Input Keyword : Samsung
Date : 2013. 2. 15
Reference Docs : wikipedia / 15 paragraph
질의어 + 출처 + 주요 문장 + 공
유 node Link + …
Key Paragraph-1 + wikipedia/원문 link
Key Paragraph-21 + PCM /원문 link
Key Paragraph-31 + 보고서/ link
Input Keyword : Samsung
Date : 2013. 2. 15
Reference Docs : wikipedia / 15 paragraph
질의어 + 출처 + 주요 문장 + 공
유 node Link + …
Key Paragraph-1 + wikipedia/원문 link
Key Paragraph-21 + PCM /원문 link
Key Paragraph-31 + 보고서/ link
Input Keyword : Samsung
Date : 2013. 2. 15
Reference Docs : wikipedia / 15 paragraph
질의어 + 출처 + 주요 문장 + 공유 node Link + …
Key Paragraph-1 + wikipedia/원문 link
Key Paragraph-21 + PCM /원문 link
Key Paragraph-31 + 보고서/ link
Input Keyword : Samsung
Date : 2013. 2. 15
Reference Docs : wikipedia / 15 paragraph
질의어 + 출처 + 주요 문장 + 공유 node Link + …
Key Paragraph-1 + wikipedia/원문 link
Key Paragraph-21 + PCM /원문 link
Key Paragraph-31 + 보고서/ link
Input Keyword : Samsung
Date : 2013. 2. 15
Reference Docs : wikipedia / 15 paragraph
질의어 + 출처 + 주요 문장 + 공유 node Link + …
Key Paragraph-1 + wikipedia/원문 link
Key Paragraph-21 + PCM /원문 link
Key Paragraph-31 + 보고서/ link
Input Keyword : Samsung
Date : 2013. 2. 15
Reference Docs : wikipedia / 15 paragraph
질의어 + 출처 + 주요 문장 + 공유 node Link + …
Prior Docs.
sharing node table
Kwd/TB Search
doc name,
Text block location
ref. freq (Line thickness)
post doc name,
original Link
sharing node table
creation
sharing node Link
공유 map 생성
i. Extracting Core knowledge from huge documents
ii. Sharing by core text block similarity
iii. Creating knowledge links by specialized categories
Core Knowledge Sharing
31. K
box
Growing Knowledge Eco-System
Individual chain
Hyper Knowledge Creation Growing knowledge eco-system
Knowledge network
creation process
Individual search knowledge saving share node creation knowledge network creation
growing knowledge eco-system
Search technology Instant search TBS for text body Chain TBS for knowledge
effect
dramatically improving knowledge sharing, increasing core knowledge acquisition opportunity and
saving time high level knowledge creation
Category
Key Paragraph-1
Key Paragraph-21
share node
KP#12
KP#13
KP#22
KP#23
KP#31
KP#32
KP#311
KP#312
KP#n1
KP#n2
report
doc
www
K. network
.
.
.
.
.
.
.
.
.
.
.
.
expansion
…
…
…
paper
32. Instance Search with 3 no’s
Keyword
+ Waiting Search Result
or No Result
Navigation
or re-search
Current Search Engine
Dramatic Improvement of Speed, Quality and Easy of Use
No enter key
No Waiting
No Zero Result
zKeyqword
Search result
(keyword)
no result ?
Instant Correction for No Result
Typo/No result Keyword input
New Search
Result provided by each input key stroke
33. Purchase Department Instant Search Stock Management
Category for individual purchase
person
(IP / log in ID)
Raw Material information for
MEMORY
Current Stock information
Basic product info., related
company info.
Instant Search
Text block Drag & search
Super-fast text block content and
location search
Based on dramatically improved
Search speed,
a new algorithm
for text block search applied
Provide super-fast and customized search for each character input
to more than 10,000 departments in S group
Supplier
(domestic/foreign)
In-stock or
supply
information
Character Input
“M E M O R Y”
Customized Instant Search by Department
z Instant Search
Recommend relevant item for
each character input
Attachment, manual, product detail, content,
etc
Text block Search
z Text block
chain search
34. (non)login Search
Cart Purchase
Recommending Target books ManagingUser/Group Behavior Pattern
Personal
Pattern
Real time
recommendation
Recommend
Filter
Group
Pattern
Steve Jobs
Search
- Goods Attributes
- MD managing
points
- Target DB
Book Recommendation Service RERE
category
author
Recommendation
Accuracy
(including MD )
event
Search
Pattern
Recommend
Filter
Recommend
Filter
Real time
Behavior log
CRM/log
DB
Product
property
Utilizing input keyword, click data and purchase history
10. 6. am 07: 00
Data propagation presentation… book purchase / iphone4S order
Real time pattern
am 07: 10
Category/author/event score
curve
am 08: 30
Real time recommendation
MD manual
recommendation
Learning
35. Similar Trademark Search OzMarker
Example
KR trade mark search system
About 5 million trademarks invert file creation time
Algorithm
Current mechanism with datamining OzMarker
3.2 bil*0.00003 sec/case = 26.7 hr 5mil * 0.00003 sec/case = 150s
Invert file creation 5 mil * (38 = 8 digit * 3 similar char) = 3.2 bil indiecs 5mil indices
Current OzMarker
Processing
mechanism
Similar character Indexing Typo correction algorithm
Better than Google typo correction
correction algorithm
Indexing time and
and size
About 24 hr, 10 GB About 100 sec, 500 MB KR Trademark 5 million data
Accuracy
Depends on similarity definition
definition
Similar character, similarly
pronounced words algorithm
Independent of languages
Easy of Use
Delay of new trademark
registration
Registration of new trademark
trademark within a few seconds
Expandability
Applying a new pattern requires
requires overall indexing
No overall indexing Independent of languages
36. Big data in Memory solution, AiM
1. Basic Fn(Ansi Query)
2. User Defined Fn
3. Statistics Fn
4. Other data analysis
Fn
User
Defined
Basic
Fn
+ =
- > <
Data analysis tool
Add
New pattern
High level memory utilization
Efficient memory use
Super light Search
Engine Tech
Massive data analysis partSearch engine part
AiM
Existing Search Engine TechGeneral
Solution
Meet user requirements such as data
analysis speed, analysis tool and
statistical methods
Big Data
Structure and Algorithm
Fast and convenient
Editor's Notes
메모리 활용에 예
오픈소스로 가능? = 일반화에 위배
설혹 가능하더라도? 문제???
알고리즘 측면
현재 시스템에서 불가능한 것으로 판명
특허청 유사 상표 조회
알고리즘의 중요성(차별화된 구조 기반)
오픈소스로 과연 이런 작업이 가능할 것인가?
실제 사례
Core 기술 이용의 당위성
오픈소스에 대한 끝없는 질문?
실제 저희 엔진에 사용되고 있는 예
메모리 축약의 중요성
메모리 기반 Big Data 처리 여건 형성
개발할 상세 요소 기술 구성도
메모리 강조 : big Data in memory 필요 하지만 사라진다
단위서버당 메모리 활용 처리 용량 = 서버당 처리 용량 수준이상 이라면?
대상 자료
실제 사례
Core 기술 이용의 당위성
오픈소스에 대한 끝없는 질문?
어떤점이 다른가?
만들어진 엔진을 사용하지 않는다.
설계도면과 내부 구조를 들여다 보는 것과 같은 infra를 가지고 본 과제를 제안 차별성
할일은 ? 누구나 같다, 정/비정형화
개인화 맞춤 검색과 일반 검색 비교
일반검색 필요에 따라 batch ordering 된 색인 DB 운영
개인화 맞춤 검색
실시간 ranking, feed back, 검색 이력 관리
1~4 의 실용화 요소 기술이 요구
여러가지 개인화 검색의 접근 방법 중 본 과제의 접근 방법 중심 경량화 검색엔진 기술 활용
어떤점이 다른가?
만들어진 엔진을 사용하지 않는다.
설계도면과 내부 구조를 들여다 보는 것과 같은 infra를 가지고 본 과제를 제안 차별성
할일은 ? 누구나 같다, 정/비정형화