More Related Content Similar to tree-sitter-objc-slides.pptx (20) More from Jiyee Sheng (12) tree-sitter-objc-slides.pptx3. 想实现一套基于 LLVM Pass 的自动化重构工具,用于批量重构头条代码。
背景
https://semgrep.dev/
Static analysis at ludicrous speed
Find bugs and enforce code standards
Keywords:模式匹配,模式替换,词法分析,语法分析,AST,S-Expression
https://github.com/rsc/rf
4. https://tree-sitter.github.io/tree-sitter/
Tree-sitter is a parser generator tool and an
incremental parsing library.
tree-sitter 介绍
General enough to parse any programming language
Fast enough to parse on every keystroke in a text editor
Robust enough to provide useful results even in the presence of syntax errors
Dependency-free so that the runtime library (which is written in pure C) can be
embedded in any application
JavaScript -> Rust -> C -> *
6. tree-sitter-objc 实现分析 - 环境准备
https://github.com/jiyee/tree-sitter-objc
https://code.byted.org/TTIOS/tree-sitter-objc-toutiao
$ brew install node
$ npm install -g tree-sitter-cli
$ git clone https://github.com/jiyee/tree-sitter-objc
$ cd tree-sitter-objc && npm install && cd ..
$ git clone https://code.byted.org/TTIOS/tree-sitter-objc-toutiao
$ cd tree-sitter-objc-toutiao && npm install && cd ..
$ tree-sitter --help
$ tree-sitter generate
$ tree-sitter test
$ tree-sitter parse <file>
7. tree-sitter-objc 实现分析 - 目录结构
├── Cargo.toml
├── LICENSE
├── README.md
├── binding.gyp
├── bindings/
├── build/
├── examples/
├── grammar.js
├── package.json
├── src/
└── test/
<- Rust 配置文件
<- gyp 明细
<- bindings 源文件,C & Rust
<- binding 编译产物
<- examples 文件
<- 语法文件
<─ npm 配置文件
<- C 编译产物
<- 单元测试
8. tree-sitter-objc 实现分析 - 单词测验
• identifier
• declaration
• definition
• type
• type specifier
• type qualifier
• type attribute
• declarator
• statement
• expression
#import <Foundation/Foundation.h>
int main(int argc, char *argv[]) {
char *string = "string";
if (string) {
printf("%s", string);
}
}
9. tree-sitter-objc 实现分析 - 语法结构
#import <Foundation/Foundation.h>
@interface ClassName : NSObject
@property (nonatomic, strong) NSString *string;
- (void)print;
@end
@implementation ClassName
- (void)print {
NSLog(@"%@", self.string);
}
@end
int main(int argc, char *argv[]) {
ClassName *class = [ClassName new];
class.string = @"tree-sitter-objc";
[class print];
}
preproc_import
class interface declaration
property declaration
method declaration
class implementation declaration
method definition
statement
expression
function definition
statement
expression
...
10. tree-sitter-objc 实现分析 - Zen
排列组合
string / regexp / choice / seq / optional / repeat / repeat1 / commaSep / commaSep1
13. tree-sitter-objc 实现分析 - 原理
编译原理:LL, LR 文法浅析
Objective-C 代码经过 LLVM 编译,都是编译成 C++ 语言。
Objective-C 代码里最多的就是指令(directive)
例如: #import / @import / #define / #if / __attribute__ 等等
LLVM 编译过程:
preprocess -> lexer -> parser
tree-sitter parser 过程:
lexer -> parser,缺少 preprocess 过程,导致宏(macro)无法预处理展开
15. tree-sitter-objc 实现分析 - 遗留问题
1. preprec 尤其是 #if / #endif 更优雅的实现,external scanner
2. PREC 优先级排序
3. preprocessor directive 更优雅的实现,Parsing Preprocessor Directives in
Objective-C
4. parser.c size optimization
5. 边缘 bad case 补充实现
16. 可落地场景探讨
• 编辑器语法高亮和 Clode IDE 跳转 https://github.com/tree-sitter/tree-sitter/issues/139
• 包大小预估准确率优化
• Spell Check 准确率优化
• 静态代码检查,尤其是语法 lint
• 新人代码规范准入检查
• inline edit
17. 语法测验
// FIXME
int (^square(int x))(void) {
return ^{ return x * x; };
}
CGFloat (*msgSendIMP)(id, SEL, id, CGFloat) = (CGFloat (*)(id, SEL,
id, CGFloat))objc_msgSend;
NSString *string = @"First Line"
@"Second Line";
NSString *string = @"First Line"
"Second Line";
// FIXME
typedef struct _AspectBlock {
__unused Class isa;
void (__unused *invoke)(struct _AspectBlock *block, ...);
} *AspectBlockRef;
18. 对代码规范的新认识
• attribute specifier 规范使用
• 例如 availability, NS_SWIFT_NAME
• Preprocessor directive 不改变语法规则
• expression -> expression, statement -> statement
• directive prefix @
• 宏定义尽量集中统一
• Generics 泛型的规范使用
• Q:怎么写都不会错,那什么才是规范?
19. 回顾与思考
1. 了解社区
2. 开始动手,明确目标
3. 做一点,快速反馈,看到结果
4. 循环迭代,单测保证不劣化
5. 关注到全局进展,知道自己的位置,能够预估工作量,同时给自己更多的信心
6. 文档记录,git commits 进展记录
7. 过程繁琐,专注,持续
8. 发布