SlideShare a Scribd company logo
1 of 9
Download to read offline
BSDCONV
Kuan-Chung Chiu
(buganini at gmail dot com)
Contents
1 Syntax 1
1.1 Phases & Cascade . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Codecs & Fallback . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Codec argument . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Type & Flag 3
2.1 Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Helper codecs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 C Programming guide 6
3.1 Conversion instance lifecycle . . . . . . . . . . . . . . . . . . . . . 6
3.2 Skeleton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Output mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3.1 BSDCONV HOLD . . . . . . . . . . . . . . . . . . . . . . 8
3.3.2 BSDCONV AUTOMALLOC . . . . . . . . . . . . . . . . 8
3.3.3 BSDCONV PREMALLOCED . . . . . . . . . . . . . . . 8
3.3.4 BSDCONV FILE . . . . . . . . . . . . . . . . . . . . . . . 8
3.3.5 BSDCONV FD . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3.6 BSDCONV NULL . . . . . . . . . . . . . . . . . . . . . . 8
3.3.7 BSDCONV PASS . . . . . . . . . . . . . . . . . . . . . . 8
3.4 Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.5 Memory pool issue . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1 Syntax
1.1 Phases & Cascade
There are three types of conversion phases defined in bsdconv: from, inter,
to. The from phase takes byte sequence and decodes it into a list of code
points (except for from/PASS), on the other hand, the to phase encodes the list
of code points back to byte sequence. The inter phase does code point to code
point mapping.
1
A basic conversion consists of from and to phases. Search of codec name is
case insensitive.
ISO-8859-1 : UTF-8
from to
Figure 1: Basic two phases conversion
Between from and to phases, we can have an inter phase.
UTF-8 : UPPER : UTF-8
from inter to
Figure 2: Conversion with inter-mapping phase
There can be more than one inter phases.
UTF-8 : UPPER : FULL : UTF-8
from inter inter to
Figure 3: Conversion with multiple inter-mapping phases
An inter phase can be used standalonely, mostly in programmatic way.
HALF
inter
Figure 4: Standalone inter-mapping phase
Conversions can be cascaded with pipe symbol. In most cases it is equivalent
to shell pipe unless the use of codecs manipulating flag (described in section
2.2).
UTF-8 : BIG5 | BIG5 : UTF-8
from to from to
Figure 5: Cascaded conversions
ASCII-compatible codecs are designed to exclude ASCII part and named as
FOO, with alias FOO ⇒ FOO,ASCII or ASCII, FOO.
2
1.2 Codecs & Fallback
A phase consists of one or more codecs, separated by comma. The latter
codecs will be utilized if and only if the former codecs fail to consume the
incoming data, once a codec finish its task, the first codec will be up again for
upcoming data.
UTF-8 : ASCII , 3F
from to
Figure 6: Fallback codec
1.3 Codec argument
Some codecs take arguments, after the hash symbol.
UTF-8 : ASCII , ANY#3F
Figure 7: Passing argument to codec
Some codecs take arguments in key-value form. Argument name and value
consist of numbers, alphabets, hyphen and underscore, binary data are repre-
sented in hexadecimal form.
UTF-8 : ASCII , ESCAPE#PREFIX=2575
Figure 8: Passing argument to codec in key-value form
Multiple arguments can be passed by being concatenated with ampersand.
UTF-8 : ASCII , ESCAPE#PREFIX=262378&SUFFIX=3B
Figure 9: Passing multiple arguments to codec
List of data can be passed in dot-separated form.
ANY#013F.0121 : ASCII
Figure 10: Data list
3
2 Type & Flag
2.1 Type
A code point packet note its type at first byte.
ID Description Provider(from) Consumer(to)
00 Bsdconv special characters BSDCONV-KEYWORD BSDCONV-KEYWORD
01 Unicode Most decoders Most encoders
02 CNS116431
CNS11643 CNS11643
03 Byte BYTE; ESCAPE BYTE; ESCAPE#FOR=BYTE
04 Chinese components inter/ZH-DECOMP inter/ZH-COMP
1B ANSI control sequence ANSI-CONTROL -
Table 1: Types and its provider/consumer (just to name a few)
Entity Unicode UTF-8 Hex
% U+0025 25
A U+0041 41
∀ U+2200 E28880
A∀
Input (UTF-8 literal)
ASCII,BYTE : ...
Decoder
01
41
03
E2
03
88
03
80
Internal data
... : ASCII,ESCAPE
Encoder
41
”A”
25
45
32
”%E2”
25
38
38
”%88”
25
38
30
”%80”
Internal data
A%E2%88%80
Output (UTF-8 literal)
Figure 11: Fallback & Type
1As for the intersection of CNS11643 and Unicode, from/CNS11643 does conversion to
unicode type if possible. Vice versa, to/CNS11643 does conversion from unicode type if
possible.
4
2.2 Flag
A code point packet carries its own flags. Currently there are two types of
flag, FREE and MARK. Flag FREE indicates that the packet buffer needs
to be recycled or released, this is used only when programming is involved.
Flag MARK is (currently only) added by codec to/PASS#MARK and used
by codec from/PASS#UNMARK to identify which packets have already been
decoded and needs to be passed through in from phase.
The code point packets structure is retained, including flags, within cascaded
conversions, but not for shell pipe. Figure 11 demonstrate the flow of conversion
ESCAPE:PASS#MARK&FOR=1,BYTE|PASS#UNMARK,UTF-8:UTF-8”.
Entity Unicode UTF-8 Hex
α U+03B1 CEB1
β U+03B2 CEB2
%u03B1%CE%B2
Input (UTF-8 literal)
ESCAPE : ...
Decoder
01
03
B1
03
CE
03
B2
Internal data
... : PASS#MARK&FOR=1,BYTE
Encoder
01
03
B1
MARK
CE B2
Internal data
PASS#UNMARK,UTF-8 : ...
Decoder
01
03
B1
01
03
B2
Internal data
... : UTF-8
Encoder
CE
B1
”α”
CE
B2
”β”
Internal data
αβ
Output (UTF-8 literal)
Figure 12: Flag, from/PASS & to/PASS
5
2.3 Helper codecs
Codec from/bsdconv can be used to input internal data structure, and codec
to/BSDCONV-OUTPUT can be used to inspect type and flags.
3 C Programming guide
3.1 Conversion instance lifecycle
bsdconv create()
bsdconv init()
set input/output parameters
is last chunk set flush flag
bsdconv()
collect output
has next chunk
bsdconv destroy()
yes
no
no
yes
next chunk
no
reuse instance
Figure 13: Conversion instance lifecycle
6
3.2 Skeleton
#include <bsdconv.h>
bsdconv_instance *ins;
char *buf;
size_t len;
ins=bsdconv_create ("UTF -8: UPSIDEDOWN:UTF -8");
bsdconv_init(ins);
do{
buf=bsdconv_malloc (BUFSIZ );
/*
* fill data into buf
* len=filled data length
*/
ins ->input.data=buf;
ins ->input.len=len;
ins ->input.flags |= F_FREE;
ins ->input.next=NULL;
if(ins ->input.len ==0)
{ // last chunk
ins ->flush =1;
}
/*
* set output parameter (see section 3.3)
*/
bsdconv(ins);
/*
* collect output (see section 3.3)
*/
}while(ins ->flush ==0);
bsdconv_destroy (ins);
For chunked conversion, input buffer should be allocated for each input to
prevent content change during conversion. Output buffer with flag FREE is
safe to be reused.
3.3 Output mode
ins -> output mode Description
BSDCONV HOLD Hold output in memory
BSDCONV AUTOMALLOC Return output buffer which should be free() after use
BSDCONV PREMALLOCED Fill output into given buffer
BSDCONV FILE Write output into (FILE *) stream file
BSDCONV FD Write output into (int) file descriptor
BSDCONV NULL Discard output
BSDCONV PASS Pass to another conversion instance
7
3.3.1 BSDCONV HOLD
This is default output mode after bsdconv init(). Usually used with BSD-
CONV AUTOMALLOC or BSDCONV PREMALLOCED to get squeezed out-
put.
3.3.2 BSDCONV AUTOMALLOC
Output buffer will be allocated dynamically, the actual buffer size will be
ins->output.len + output content length, it is useful when you need to have
terminating null byte.
3.3.3 BSDCONV PREMALLOCED
If ins->output.data is NULL, the total length of content to be output will
be put to ins->output.len, but output will still be hold in memory. Otherwise,
bsdconv() will fill as much unfragmented data as possible within the buffer size
limit specified at ins->output.len.
3.3.4 BSDCONV FILE
Output will be fwrite() to the given FILE * at ins->output.data.
3.3.5 BSDCONV FD
Output will be write() to the given (int) file descriptor at ins->output.data.
Casting to intptr t (defined in <stdint.h>) is needed to eliminate compiler
warning.
3.3.6 BSDCONV NULL
Output will be discard. This is usually used with evaluating conversion (see
section 3.4).
3.3.7 BSDCONV PASS
Output packets will be passed to the given (struct bsdconv instance *) con-
version instance at ins->output.data.
3.4 Counters
Counters are listed in ins->counter in linked-list with following structure.
struct bsdconv_counter_entry {
char *key;
bsdconv_counter_t val;
struct bsdconv_counter_entry *next;
};
IERR and OERR are mandatory error counters.
8
There are two APIs to get/reset counter(s):
bsdconv_counter_t * bsdconv_counter (char *name );
Return the pointer to the counter value. bsdconv counter t is currently defined
as size t.
void bsdconv_counter_reset (char *name );
Reset the specified counter, if name is NULL, all counters are reset.
3.5 Memory pool issue
In case libbsdconv and your program uses different memory pools, bsdconv malloc()
and bsdconv free() should be used to replace malloc() and free().
9

More Related Content

What's hot

N_Asm Assembly arithmetic instructions (sol)
N_Asm Assembly arithmetic instructions (sol)N_Asm Assembly arithmetic instructions (sol)
N_Asm Assembly arithmetic instructions (sol)Selomon birhane
 
Assembly Language Lecture 2
Assembly Language Lecture 2Assembly Language Lecture 2
Assembly Language Lecture 2Motaz Saad
 
Introduction to 8088 microprocessor
Introduction to 8088 microprocessorIntroduction to 8088 microprocessor
Introduction to 8088 microprocessorDwight Sabio
 
EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5PRADEEP
 
Assembly Language Programming By Ytha Yu, Charles Marut Chap 4 (Introduction ...
Assembly Language Programming By Ytha Yu, Charles Marut Chap 4 (Introduction ...Assembly Language Programming By Ytha Yu, Charles Marut Chap 4 (Introduction ...
Assembly Language Programming By Ytha Yu, Charles Marut Chap 4 (Introduction ...Bilal Amjad
 
Chapter 6 Flow control Instructions
Chapter 6 Flow control InstructionsChapter 6 Flow control Instructions
Chapter 6 Flow control Instructionswarda aziz
 
Embedded c program and programming structure for beginners
Embedded c program and programming structure for beginnersEmbedded c program and programming structure for beginners
Embedded c program and programming structure for beginnersKamesh Mtec
 
Chapter 2 The 8088 Microprocessor
Chapter 2   The 8088 MicroprocessorChapter 2   The 8088 Microprocessor
Chapter 2 The 8088 MicroprocessorDwight Sabio
 
Instruction set of 8086
Instruction set of 8086Instruction set of 8086
Instruction set of 80869840596838
 
Assembly Language Lecture 1
Assembly Language Lecture 1Assembly Language Lecture 1
Assembly Language Lecture 1Motaz Saad
 
1344 Alp Of 8086
1344 Alp Of 80861344 Alp Of 8086
1344 Alp Of 8086techbed
 
X86 assembly & GDB
X86 assembly & GDBX86 assembly & GDB
X86 assembly & GDBJian-Yu Li
 
Assembly Language Lecture 4
Assembly Language Lecture 4Assembly Language Lecture 4
Assembly Language Lecture 4Motaz Saad
 

What's hot (20)

Embedded c
Embedded cEmbedded c
Embedded c
 
N_Asm Assembly arithmetic instructions (sol)
N_Asm Assembly arithmetic instructions (sol)N_Asm Assembly arithmetic instructions (sol)
N_Asm Assembly arithmetic instructions (sol)
 
Assembly Language Lecture 2
Assembly Language Lecture 2Assembly Language Lecture 2
Assembly Language Lecture 2
 
Introduction to 8088 microprocessor
Introduction to 8088 microprocessorIntroduction to 8088 microprocessor
Introduction to 8088 microprocessor
 
Microcontroller part 4
Microcontroller part 4Microcontroller part 4
Microcontroller part 4
 
EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5
 
Assembly Language Programming By Ytha Yu, Charles Marut Chap 4 (Introduction ...
Assembly Language Programming By Ytha Yu, Charles Marut Chap 4 (Introduction ...Assembly Language Programming By Ytha Yu, Charles Marut Chap 4 (Introduction ...
Assembly Language Programming By Ytha Yu, Charles Marut Chap 4 (Introduction ...
 
Chapter 6 Flow control Instructions
Chapter 6 Flow control InstructionsChapter 6 Flow control Instructions
Chapter 6 Flow control Instructions
 
Embedded c program and programming structure for beginners
Embedded c program and programming structure for beginnersEmbedded c program and programming structure for beginners
Embedded c program and programming structure for beginners
 
Lecture5(1)
Lecture5(1)Lecture5(1)
Lecture5(1)
 
Chapter 2 The 8088 Microprocessor
Chapter 2   The 8088 MicroprocessorChapter 2   The 8088 Microprocessor
Chapter 2 The 8088 Microprocessor
 
Lecture6
Lecture6Lecture6
Lecture6
 
Microcontroller part 6_v1
Microcontroller part 6_v1Microcontroller part 6_v1
Microcontroller part 6_v1
 
FPGA - Programmable Logic Design
FPGA - Programmable Logic DesignFPGA - Programmable Logic Design
FPGA - Programmable Logic Design
 
Instruction set of 8086
Instruction set of 8086Instruction set of 8086
Instruction set of 8086
 
Assembly Language Lecture 1
Assembly Language Lecture 1Assembly Language Lecture 1
Assembly Language Lecture 1
 
1344 Alp Of 8086
1344 Alp Of 80861344 Alp Of 8086
1344 Alp Of 8086
 
X86 assembly & GDB
X86 assembly & GDBX86 assembly & GDB
X86 assembly & GDB
 
Introduction to HDLs
Introduction to HDLsIntroduction to HDLs
Introduction to HDLs
 
Assembly Language Lecture 4
Assembly Language Lecture 4Assembly Language Lecture 4
Assembly Language Lecture 4
 

Similar to Bsdconv

Error Resiliency and Concealment in H.264 MPEG-4 Part 10
Error Resiliency and Concealment in H.264 MPEG-4 Part 10Error Resiliency and Concealment in H.264 MPEG-4 Part 10
Error Resiliency and Concealment in H.264 MPEG-4 Part 10coldfire7
 
Notes of 8085 micro processor Programming for BCA, MCA, MSC (CS), MSC (IT) &...
Notes of 8085 micro processor Programming  for BCA, MCA, MSC (CS), MSC (IT) &...Notes of 8085 micro processor Programming  for BCA, MCA, MSC (CS), MSC (IT) &...
Notes of 8085 micro processor Programming for BCA, MCA, MSC (CS), MSC (IT) &...ssuserd6b1fd
 
8085 micro processor
8085 micro processor8085 micro processor
8085 micro processorArun Umrao
 
20090814102834_嵌入式C与C++语言精华文章集锦.docx
20090814102834_嵌入式C与C++语言精华文章集锦.docx20090814102834_嵌入式C与C++语言精华文章集锦.docx
20090814102834_嵌入式C与C++语言精华文章集锦.docxMostafaParvin1
 
Berkeley Packet Filters
Berkeley Packet FiltersBerkeley Packet Filters
Berkeley Packet FiltersKernel TLV
 
ADS1256 library documentation
ADS1256 library documentationADS1256 library documentation
ADS1256 library documentationCuriousScientist
 
Assembly Codes in C Programmes - A Short Notes by Arun Umrao
Assembly Codes in C Programmes - A Short Notes by Arun UmraoAssembly Codes in C Programmes - A Short Notes by Arun Umrao
Assembly Codes in C Programmes - A Short Notes by Arun Umraossuserd6b1fd
 
Bascom avr-course
Bascom avr-courseBascom avr-course
Bascom avr-coursehandson28
 
VJITSk 6713 user manual
VJITSk 6713 user manualVJITSk 6713 user manual
VJITSk 6713 user manualkot seelam
 
Basic Interoperable Scrambling System
Basic Interoperable Scrambling SystemBasic Interoperable Scrambling System
Basic Interoperable Scrambling SystemSais Abdelkrim
 
Tutorial-Auto-Code-Generation-for-F2803x-Target.pdf
Tutorial-Auto-Code-Generation-for-F2803x-Target.pdfTutorial-Auto-Code-Generation-for-F2803x-Target.pdf
Tutorial-Auto-Code-Generation-for-F2803x-Target.pdfmounir derri
 
Image compression1.ppt
Image compression1.pptImage compression1.ppt
Image compression1.pptssuser812128
 
OpenWRT manual
OpenWRT manualOpenWRT manual
OpenWRT manualfosk
 

Similar to Bsdconv (20)

Pcbgcode
PcbgcodePcbgcode
Pcbgcode
 
Error Resiliency and Concealment in H.264 MPEG-4 Part 10
Error Resiliency and Concealment in H.264 MPEG-4 Part 10Error Resiliency and Concealment in H.264 MPEG-4 Part 10
Error Resiliency and Concealment in H.264 MPEG-4 Part 10
 
Notes of 8085 micro processor Programming for BCA, MCA, MSC (CS), MSC (IT) &...
Notes of 8085 micro processor Programming  for BCA, MCA, MSC (CS), MSC (IT) &...Notes of 8085 micro processor Programming  for BCA, MCA, MSC (CS), MSC (IT) &...
Notes of 8085 micro processor Programming for BCA, MCA, MSC (CS), MSC (IT) &...
 
8085 micro processor
8085 micro processor8085 micro processor
8085 micro processor
 
20090814102834_嵌入式C与C++语言精华文章集锦.docx
20090814102834_嵌入式C与C++语言精华文章集锦.docx20090814102834_嵌入式C与C++语言精华文章集锦.docx
20090814102834_嵌入式C与C++语言精华文章集锦.docx
 
Berkeley Packet Filters
Berkeley Packet FiltersBerkeley Packet Filters
Berkeley Packet Filters
 
ADS1256 library documentation
ADS1256 library documentationADS1256 library documentation
ADS1256 library documentation
 
Compress
CompressCompress
Compress
 
Assembly Codes in C Programmes - A Short Notes by Arun Umrao
Assembly Codes in C Programmes - A Short Notes by Arun UmraoAssembly Codes in C Programmes - A Short Notes by Arun Umrao
Assembly Codes in C Programmes - A Short Notes by Arun Umrao
 
Interprocess Message Formats
Interprocess Message FormatsInterprocess Message Formats
Interprocess Message Formats
 
Multi Process Message Formats
Multi Process Message FormatsMulti Process Message Formats
Multi Process Message Formats
 
Bascom avr-course
Bascom avr-courseBascom avr-course
Bascom avr-course
 
VJITSk 6713 user manual
VJITSk 6713 user manualVJITSk 6713 user manual
VJITSk 6713 user manual
 
Basic Interoperable Scrambling System
Basic Interoperable Scrambling SystemBasic Interoperable Scrambling System
Basic Interoperable Scrambling System
 
OPCDE Crackme Solution
OPCDE Crackme SolutionOPCDE Crackme Solution
OPCDE Crackme Solution
 
Tutorial-Auto-Code-Generation-for-F2803x-Target.pdf
Tutorial-Auto-Code-Generation-for-F2803x-Target.pdfTutorial-Auto-Code-Generation-for-F2803x-Target.pdf
Tutorial-Auto-Code-Generation-for-F2803x-Target.pdf
 
Image compression1.ppt
Image compression1.pptImage compression1.ppt
Image compression1.ppt
 
Lb35189919904
Lb35189919904Lb35189919904
Lb35189919904
 
vorlage
vorlagevorlage
vorlage
 
OpenWRT manual
OpenWRT manualOpenWRT manual
OpenWRT manual
 

Recently uploaded

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 

Recently uploaded (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

Bsdconv

  • 1. BSDCONV Kuan-Chung Chiu (buganini at gmail dot com) Contents 1 Syntax 1 1.1 Phases & Cascade . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Codecs & Fallback . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Codec argument . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Type & Flag 3 2.1 Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Helper codecs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 C Programming guide 6 3.1 Conversion instance lifecycle . . . . . . . . . . . . . . . . . . . . . 6 3.2 Skeleton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.3 Output mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.3.1 BSDCONV HOLD . . . . . . . . . . . . . . . . . . . . . . 8 3.3.2 BSDCONV AUTOMALLOC . . . . . . . . . . . . . . . . 8 3.3.3 BSDCONV PREMALLOCED . . . . . . . . . . . . . . . 8 3.3.4 BSDCONV FILE . . . . . . . . . . . . . . . . . . . . . . . 8 3.3.5 BSDCONV FD . . . . . . . . . . . . . . . . . . . . . . . . 8 3.3.6 BSDCONV NULL . . . . . . . . . . . . . . . . . . . . . . 8 3.3.7 BSDCONV PASS . . . . . . . . . . . . . . . . . . . . . . 8 3.4 Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.5 Memory pool issue . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1 Syntax 1.1 Phases & Cascade There are three types of conversion phases defined in bsdconv: from, inter, to. The from phase takes byte sequence and decodes it into a list of code points (except for from/PASS), on the other hand, the to phase encodes the list of code points back to byte sequence. The inter phase does code point to code point mapping. 1
  • 2. A basic conversion consists of from and to phases. Search of codec name is case insensitive. ISO-8859-1 : UTF-8 from to Figure 1: Basic two phases conversion Between from and to phases, we can have an inter phase. UTF-8 : UPPER : UTF-8 from inter to Figure 2: Conversion with inter-mapping phase There can be more than one inter phases. UTF-8 : UPPER : FULL : UTF-8 from inter inter to Figure 3: Conversion with multiple inter-mapping phases An inter phase can be used standalonely, mostly in programmatic way. HALF inter Figure 4: Standalone inter-mapping phase Conversions can be cascaded with pipe symbol. In most cases it is equivalent to shell pipe unless the use of codecs manipulating flag (described in section 2.2). UTF-8 : BIG5 | BIG5 : UTF-8 from to from to Figure 5: Cascaded conversions ASCII-compatible codecs are designed to exclude ASCII part and named as FOO, with alias FOO ⇒ FOO,ASCII or ASCII, FOO. 2
  • 3. 1.2 Codecs & Fallback A phase consists of one or more codecs, separated by comma. The latter codecs will be utilized if and only if the former codecs fail to consume the incoming data, once a codec finish its task, the first codec will be up again for upcoming data. UTF-8 : ASCII , 3F from to Figure 6: Fallback codec 1.3 Codec argument Some codecs take arguments, after the hash symbol. UTF-8 : ASCII , ANY#3F Figure 7: Passing argument to codec Some codecs take arguments in key-value form. Argument name and value consist of numbers, alphabets, hyphen and underscore, binary data are repre- sented in hexadecimal form. UTF-8 : ASCII , ESCAPE#PREFIX=2575 Figure 8: Passing argument to codec in key-value form Multiple arguments can be passed by being concatenated with ampersand. UTF-8 : ASCII , ESCAPE#PREFIX=262378&SUFFIX=3B Figure 9: Passing multiple arguments to codec List of data can be passed in dot-separated form. ANY#013F.0121 : ASCII Figure 10: Data list 3
  • 4. 2 Type & Flag 2.1 Type A code point packet note its type at first byte. ID Description Provider(from) Consumer(to) 00 Bsdconv special characters BSDCONV-KEYWORD BSDCONV-KEYWORD 01 Unicode Most decoders Most encoders 02 CNS116431 CNS11643 CNS11643 03 Byte BYTE; ESCAPE BYTE; ESCAPE#FOR=BYTE 04 Chinese components inter/ZH-DECOMP inter/ZH-COMP 1B ANSI control sequence ANSI-CONTROL - Table 1: Types and its provider/consumer (just to name a few) Entity Unicode UTF-8 Hex % U+0025 25 A U+0041 41 ∀ U+2200 E28880 A∀ Input (UTF-8 literal) ASCII,BYTE : ... Decoder 01 41 03 E2 03 88 03 80 Internal data ... : ASCII,ESCAPE Encoder 41 ”A” 25 45 32 ”%E2” 25 38 38 ”%88” 25 38 30 ”%80” Internal data A%E2%88%80 Output (UTF-8 literal) Figure 11: Fallback & Type 1As for the intersection of CNS11643 and Unicode, from/CNS11643 does conversion to unicode type if possible. Vice versa, to/CNS11643 does conversion from unicode type if possible. 4
  • 5. 2.2 Flag A code point packet carries its own flags. Currently there are two types of flag, FREE and MARK. Flag FREE indicates that the packet buffer needs to be recycled or released, this is used only when programming is involved. Flag MARK is (currently only) added by codec to/PASS#MARK and used by codec from/PASS#UNMARK to identify which packets have already been decoded and needs to be passed through in from phase. The code point packets structure is retained, including flags, within cascaded conversions, but not for shell pipe. Figure 11 demonstrate the flow of conversion ESCAPE:PASS#MARK&FOR=1,BYTE|PASS#UNMARK,UTF-8:UTF-8”. Entity Unicode UTF-8 Hex α U+03B1 CEB1 β U+03B2 CEB2 %u03B1%CE%B2 Input (UTF-8 literal) ESCAPE : ... Decoder 01 03 B1 03 CE 03 B2 Internal data ... : PASS#MARK&FOR=1,BYTE Encoder 01 03 B1 MARK CE B2 Internal data PASS#UNMARK,UTF-8 : ... Decoder 01 03 B1 01 03 B2 Internal data ... : UTF-8 Encoder CE B1 ”α” CE B2 ”β” Internal data αβ Output (UTF-8 literal) Figure 12: Flag, from/PASS & to/PASS 5
  • 6. 2.3 Helper codecs Codec from/bsdconv can be used to input internal data structure, and codec to/BSDCONV-OUTPUT can be used to inspect type and flags. 3 C Programming guide 3.1 Conversion instance lifecycle bsdconv create() bsdconv init() set input/output parameters is last chunk set flush flag bsdconv() collect output has next chunk bsdconv destroy() yes no no yes next chunk no reuse instance Figure 13: Conversion instance lifecycle 6
  • 7. 3.2 Skeleton #include <bsdconv.h> bsdconv_instance *ins; char *buf; size_t len; ins=bsdconv_create ("UTF -8: UPSIDEDOWN:UTF -8"); bsdconv_init(ins); do{ buf=bsdconv_malloc (BUFSIZ ); /* * fill data into buf * len=filled data length */ ins ->input.data=buf; ins ->input.len=len; ins ->input.flags |= F_FREE; ins ->input.next=NULL; if(ins ->input.len ==0) { // last chunk ins ->flush =1; } /* * set output parameter (see section 3.3) */ bsdconv(ins); /* * collect output (see section 3.3) */ }while(ins ->flush ==0); bsdconv_destroy (ins); For chunked conversion, input buffer should be allocated for each input to prevent content change during conversion. Output buffer with flag FREE is safe to be reused. 3.3 Output mode ins -> output mode Description BSDCONV HOLD Hold output in memory BSDCONV AUTOMALLOC Return output buffer which should be free() after use BSDCONV PREMALLOCED Fill output into given buffer BSDCONV FILE Write output into (FILE *) stream file BSDCONV FD Write output into (int) file descriptor BSDCONV NULL Discard output BSDCONV PASS Pass to another conversion instance 7
  • 8. 3.3.1 BSDCONV HOLD This is default output mode after bsdconv init(). Usually used with BSD- CONV AUTOMALLOC or BSDCONV PREMALLOCED to get squeezed out- put. 3.3.2 BSDCONV AUTOMALLOC Output buffer will be allocated dynamically, the actual buffer size will be ins->output.len + output content length, it is useful when you need to have terminating null byte. 3.3.3 BSDCONV PREMALLOCED If ins->output.data is NULL, the total length of content to be output will be put to ins->output.len, but output will still be hold in memory. Otherwise, bsdconv() will fill as much unfragmented data as possible within the buffer size limit specified at ins->output.len. 3.3.4 BSDCONV FILE Output will be fwrite() to the given FILE * at ins->output.data. 3.3.5 BSDCONV FD Output will be write() to the given (int) file descriptor at ins->output.data. Casting to intptr t (defined in <stdint.h>) is needed to eliminate compiler warning. 3.3.6 BSDCONV NULL Output will be discard. This is usually used with evaluating conversion (see section 3.4). 3.3.7 BSDCONV PASS Output packets will be passed to the given (struct bsdconv instance *) con- version instance at ins->output.data. 3.4 Counters Counters are listed in ins->counter in linked-list with following structure. struct bsdconv_counter_entry { char *key; bsdconv_counter_t val; struct bsdconv_counter_entry *next; }; IERR and OERR are mandatory error counters. 8
  • 9. There are two APIs to get/reset counter(s): bsdconv_counter_t * bsdconv_counter (char *name ); Return the pointer to the counter value. bsdconv counter t is currently defined as size t. void bsdconv_counter_reset (char *name ); Reset the specified counter, if name is NULL, all counters are reset. 3.5 Memory pool issue In case libbsdconv and your program uses different memory pools, bsdconv malloc() and bsdconv free() should be used to replace malloc() and free(). 9