SlideShare a Scribd company logo
비트벡터
- 중복하지 않는 정수를 원소로 집합을 비트로 나타내는 방식
비트벡터가 이용 가능한 경우
- 원소의 값이 정수여야 한다.
- 원소(숫자)가 중복되지 않아야 한다.
- 숫자의 범위(최소와 최대의 차이)가 작아야 한다. (예: 1~30 사이의 정수, -3 ~ 10 사이의 정수 ....)
- 원소와 연관된 다른 정보가 없어야 한다. (원소의 유무만 표시할 수 있으므로)
정수집합을 비트벡터로 변환하는 예
정수 집합 { 2,5,8,10} 설명
n번째 비트 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0번째는 0의 유무 표시
원소 유무 1 1 1 1
비트벡터 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 0
- 정수 집합은 10 까지 이지만, 1byte(8bit) 단위로 크기를 맞춰야 하므로 총 2byte 를 사용
- 위의 예에서 2byte(16 bit)를 사용하는 비트벡터는 16 개의 정수 표현 가능
정수배열과 비트벡터의 비교
정수 집합 { 2,5,8,10} 저장공간 저장공간 효율
정수배열 a a[0]=2; a[1]=5; a[2]=8; a[3]=10; 4byte * 4 = 16byte 낮다
비트벡터 b 0010 0100 1010 0000 2byte 높다
비트벡터를 이용하여 문서를 색인하고 검색하는 과정
워드 크기=8bit, 문서 수=10, 질의 “A or B or C”, “A and B and C” 일 때, 비트벡터의 예
문서번호 1 2 3 4 5 6 7 8 9 10 문서는 총 10개
색인어 A
색인어 B
색인어 C
1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0
1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0
비트벡터 크기=16bit (10보다 큰 8의 배수)
문서를 색인한 결과
A or B or C 1 0 0 1 1 0 0 0 1 1 0 0 0 0 0 0 비트연산 A | B | C 의 결과
A and B and C 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 비트연산 A & B & C 의 결과
- 질의 “A or B or C”일 때, 검색된 문서는 1,4,5,9,10 이다.
- 질의 “A and B and C”일 때, 검색된 문서는 10 이다.
비트벡터와 해싱의 성능 비교
알고리즘 복잡도
해싱 비트벡터
최악 평균 최악 평균
삽입/삭제/검색 O(N) O(C) O(C)
합/교/차집합 O(N2
) O(N) O(W)
생성비용 O(B) O(B) O(W)
복사비용 O(N) O(N) O(W)
C: 상수
N: 원소의 개수
B: 버킷의 개수
W: 워드의 개수 (원소의 개수/워드의 크기)
공간 요구량
해싱(해시테이블) 비트벡터
원소수 26
210
215
26
210
215
영역크기
(원소값의 범위)
28
32 저장불가 저장불가 1,198 4,078 99,310
215
4,096 4,096 4,096 1,262 5,102 132,078
231
268,435,456 268,435,456 268,435,456 1,390 7,100 197,614
해싱: 영역크기에 비례
비트벡터: 원소수에 비례
해싱 최악의 경우
->하나의 버킷에 모든 원소가 있는 경우
해싱 평균의 경우
->하나의 버킷에 하나의 원소만 있는 경우
-> N<B 인 경우
// 비트벡터를 쉽게 사용하기 위한 C 소스
// bit_vector.h
#define BITPERWORD 32 // 32bit CPU
#define SHIFT_FIVE 5
#define MASK_BYTE 0x1F // 0001 1111
static void SetBit(int bitVector[], int i)
{
bitVector[i>>SHIFT_FIVE] |= ( 1<<(i&MASK_BYTE) );
}
static void ClearBit(int* bitVector, int i)
{
bitVector[i>>SHIFT_FIVE] &= ~( 1<<(i&MASK_BYTE) );
}
static int GetBit(int* bitVector, int i)
{
return bitVector[i>>SHIFT_FIVE] & ( 1<<(i&MASK_BYTE) );
}
// 실제로 사용하는 코드
#include “bit_vector.h”
#define BITCOUNT 1000000 //저장할 정수값의 범위
int main()
{
int bitVector[BITCOUNT/BITPERWORD+1];
int bit;
SetBit(&bitVector, 5);
bit = GetBit(&bitVector, 5);
ClearBit(&bitVector, 5);
bit = GetBit(&bitVector, 5);
return 0;
}

More Related Content

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
Marius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
Expeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
Pixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
marketingartwork
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
Skeleton Technologies
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
SpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Christy Abraham Joy
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Vit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

비트벡터

  • 1. 비트벡터 - 중복하지 않는 정수를 원소로 집합을 비트로 나타내는 방식 비트벡터가 이용 가능한 경우 - 원소의 값이 정수여야 한다. - 원소(숫자)가 중복되지 않아야 한다. - 숫자의 범위(최소와 최대의 차이)가 작아야 한다. (예: 1~30 사이의 정수, -3 ~ 10 사이의 정수 ....) - 원소와 연관된 다른 정보가 없어야 한다. (원소의 유무만 표시할 수 있으므로) 정수집합을 비트벡터로 변환하는 예 정수 집합 { 2,5,8,10} 설명 n번째 비트 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0번째는 0의 유무 표시 원소 유무 1 1 1 1 비트벡터 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 0 - 정수 집합은 10 까지 이지만, 1byte(8bit) 단위로 크기를 맞춰야 하므로 총 2byte 를 사용 - 위의 예에서 2byte(16 bit)를 사용하는 비트벡터는 16 개의 정수 표현 가능 정수배열과 비트벡터의 비교 정수 집합 { 2,5,8,10} 저장공간 저장공간 효율 정수배열 a a[0]=2; a[1]=5; a[2]=8; a[3]=10; 4byte * 4 = 16byte 낮다 비트벡터 b 0010 0100 1010 0000 2byte 높다
  • 2. 비트벡터를 이용하여 문서를 색인하고 검색하는 과정 워드 크기=8bit, 문서 수=10, 질의 “A or B or C”, “A and B and C” 일 때, 비트벡터의 예 문서번호 1 2 3 4 5 6 7 8 9 10 문서는 총 10개 색인어 A 색인어 B 색인어 C 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 비트벡터 크기=16bit (10보다 큰 8의 배수) 문서를 색인한 결과 A or B or C 1 0 0 1 1 0 0 0 1 1 0 0 0 0 0 0 비트연산 A | B | C 의 결과 A and B and C 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 비트연산 A & B & C 의 결과 - 질의 “A or B or C”일 때, 검색된 문서는 1,4,5,9,10 이다. - 질의 “A and B and C”일 때, 검색된 문서는 10 이다.
  • 3. 비트벡터와 해싱의 성능 비교 알고리즘 복잡도 해싱 비트벡터 최악 평균 최악 평균 삽입/삭제/검색 O(N) O(C) O(C) 합/교/차집합 O(N2 ) O(N) O(W) 생성비용 O(B) O(B) O(W) 복사비용 O(N) O(N) O(W) C: 상수 N: 원소의 개수 B: 버킷의 개수 W: 워드의 개수 (원소의 개수/워드의 크기) 공간 요구량 해싱(해시테이블) 비트벡터 원소수 26 210 215 26 210 215 영역크기 (원소값의 범위) 28 32 저장불가 저장불가 1,198 4,078 99,310 215 4,096 4,096 4,096 1,262 5,102 132,078 231 268,435,456 268,435,456 268,435,456 1,390 7,100 197,614 해싱: 영역크기에 비례 비트벡터: 원소수에 비례 해싱 최악의 경우 ->하나의 버킷에 모든 원소가 있는 경우 해싱 평균의 경우 ->하나의 버킷에 하나의 원소만 있는 경우 -> N<B 인 경우
  • 4. // 비트벡터를 쉽게 사용하기 위한 C 소스 // bit_vector.h #define BITPERWORD 32 // 32bit CPU #define SHIFT_FIVE 5 #define MASK_BYTE 0x1F // 0001 1111 static void SetBit(int bitVector[], int i) { bitVector[i>>SHIFT_FIVE] |= ( 1<<(i&MASK_BYTE) ); } static void ClearBit(int* bitVector, int i) { bitVector[i>>SHIFT_FIVE] &= ~( 1<<(i&MASK_BYTE) ); } static int GetBit(int* bitVector, int i) { return bitVector[i>>SHIFT_FIVE] & ( 1<<(i&MASK_BYTE) ); } // 실제로 사용하는 코드 #include “bit_vector.h” #define BITCOUNT 1000000 //저장할 정수값의 범위 int main() { int bitVector[BITCOUNT/BITPERWORD+1]; int bit; SetBit(&bitVector, 5); bit = GetBit(&bitVector, 5); ClearBit(&bitVector, 5); bit = GetBit(&bitVector, 5); return 0; }