Yes, we're going to look at file parsing. Sounds a bit boring, right? Wrong.
In this talk, just for fun, we'll find out how to parse a file. We'll look at simple, hand crafted parsers. We'll finally figure out just how lex and yacc work. And we'll pick apart structured parsers that build abstract syntax trees as you type - ReSharper style. How is an IDEs parser different to a compilers? How do you handle sensible error recovery? What about significant whitespace?
Everything you always wanted to know about parsing a file, but were too afraid to ask.
Yes, we're going to look at file parsing. Sounds a bit boring, right? Wrong.
In this talk, just for fun, we'll find out how to parse a file. We'll look at simple, hand crafted parsers. We'll finally figure out just how lex and yacc work. And we'll pick apart structured parsers that build abstract syntax trees as you type - ReSharper style. How is an IDEs parser different to a compilers? How do you handle sensible error recovery? What about significant whitespace?
Everything you always wanted to know about parsing a file, but were too afraid to ask.
Pointless Pointers - How to make our interfaces efficient?Mateusz Pusz
These are the slides from my code::dive 2017 talk. You can find the recording here: https://www.youtube.com/watch?v=qrifyjQW9gA. All my other talks can be found here: https://train-it.eu/resources.
---
C++ is not C. C++ developers too often forget about that. The effects are often disastrous. nullptr dereferences, buffer overflows, resource leaks are the problems often seen in C++ applications bug trackers. Does it have to be like that? The talk presents a few simple rules tested in production that will make most of those issues go away and never appear again in the C++ software. Interested? Come and see :-)
Kaggle Google Quest Q&A Labeling 反省会 LT資料 47th place solutionKen'ichi Matsui
The document discusses different approaches that were tried for improving the performance of a model for a question answering competition, including pre-training on additional data, modifying the model architecture by changing layers or heads, and using different loss functions or features. Various models were experimented with, such as BERT, RoBERTa, ALBERT, and XLNet. However, concatenating the question and answer encodings did not work as expected.
The document discusses principles for writing code that is easy to understand and maintain. It recommends using appropriate naming conventions that clearly convey meaning; consistent formatting and layout to improve aesthetics; comments to explain intent and issues; simplifying loops, logic and expressions; minimizing global variables and scope; and refactoring code into well-organized modules. The goal is to reduce frustration and time for future maintainers to understand the code.
The document contains 20 practice exercises involving Java programming concepts like variables, data types, operators, methods, and control flow. The exercises include writing code to calculate mathematical expressions, convert between temperature scales, find averages, and determine output based on different logical and relational expressions. Sample code is provided and students are asked questions to test their understanding of Java syntax and program execution order.
ODSC 2019: Sessionisation via stochastic periods for root event identificationKuldeep Jiwani
In todays world majority of information is generated by self sustaining systems like various kinds of bots, crawlers, servers, various online services, etc. This information is flowing on the axis of time and is generated by these actors under some complex logic. For example, a stream of buy/sell order requests by an Order Gateway in financial world, or a stream of web requests by a monitoring / crawling service in the web world, or may be a hacker's bot sitting on internet and attacking various computers. Although we may not be able to know the motive or intention behind these data sources. But via some unsupervised techniques we can try to infer the pattern or correlate the events based on their multiple occurrences on the axis of time. Associating a chain of events in order of time helps in doing a root event analysis. In certain cases a time ordered correlation and root event identification is good enough to automatically identify signatures of various malicious actors and take appropriate corrective actions to stop cyber attacks, stop malicious social campaigns, etc.
Sessionisation is one such unsupervised technique that tries to find the signal in a stream of events associated with a timestamp. In the ideal world it would resolve to finding periods with a mixture of sinusoidal waves. But for the real world this is a much complex activity, as even the systematic events generated by machines over the internet behave in a much erratic manner. So the notion of a period for a signal also changes in the real world. We can no longer associate it with a number, it has to be treated as a random variable, with expected values and associated variance. Hence we need to model "Stochastic periods" and learn their probability distributions in an unsupervised manner.
The main focus of this talk will be to showcase applied data science techniques to discover stochastic periods. There are many ways to obtain periods in data, so the journey would begin by a walk through of existing techniques like FFT (Fast Fourier Transform) then discuss about Gaussian Mixture Models. After highlighting the short comings of these techniques we will succinctly explain one of the most general non-parametric Bayesian approaches to solve this problem. Without going too deep in the complex math, we will get back to applied data science and discuss a much simpler technique that can solve the same problem if certain assumptions are satisfied.
In this talk we will demonstrate some time based pattern we discovered while working on a security analytics use case that uses Sessionisation. In the talk we will demonstrate such patterns based on an open source malware attack datasets that is available publicly.
Key concepts explained in talk: Sessionisation, Bayesian techniques of Machine Learning, Gaussian Mixture Models, Kernel density estimation, FFT, stochastic periods, probabilistic modelling, Bayesian non-parametric methods
The document discusses several "outrageous ideas" for improving graph databases, such as using a column-oriented storage model inspired by relational databases, employing worst-case optimal join algorithms, adopting a semantic query optimizer informed by mathematical concepts, and leveraging recursion to enable queries over paths in graph structures. The presentation argues that current graph database implementations are flawed and lessons from relational databases have not been adequately applied.
An efficient map-reduce algorithm is presented for computing formal concepts from binary datasets in a single iteration. The algorithm first uses map-reduce to generate a sufficient set of concepts that can be used to enumerate the entire lattice of formal concepts. It then processes the reduced output on a single machine to generate the sufficient set. Finally, it selectively enumerates all formal concepts in the lattice by using the sufficient set, which avoids computing the entire lattice. This approach improves efficiency over previous algorithms that required multiple map-reduce iterations or sequential processing of the entire lattice.
Yes, we're going to look at file parsing. Sounds a bit boring, right? Wrong.
In this talk, just for fun, we'll find out how to parse a file. We'll look at simple, hand crafted parsers. We'll finally figure out just how lex and yacc work. And we'll pick apart structured parsers that build abstract syntax trees as you type - ReSharper style. How is an IDEs parser different to a compilers? How do you handle sensible error recovery? What about significant whitespace?
Everything you always wanted to know about parsing a file, but were too afraid to ask.
Pointless Pointers - How to make our interfaces efficient?Mateusz Pusz
These are the slides from my code::dive 2017 talk. You can find the recording here: https://www.youtube.com/watch?v=qrifyjQW9gA. All my other talks can be found here: https://train-it.eu/resources.
---
C++ is not C. C++ developers too often forget about that. The effects are often disastrous. nullptr dereferences, buffer overflows, resource leaks are the problems often seen in C++ applications bug trackers. Does it have to be like that? The talk presents a few simple rules tested in production that will make most of those issues go away and never appear again in the C++ software. Interested? Come and see :-)
Kaggle Google Quest Q&A Labeling 反省会 LT資料 47th place solutionKen'ichi Matsui
The document discusses different approaches that were tried for improving the performance of a model for a question answering competition, including pre-training on additional data, modifying the model architecture by changing layers or heads, and using different loss functions or features. Various models were experimented with, such as BERT, RoBERTa, ALBERT, and XLNet. However, concatenating the question and answer encodings did not work as expected.
The document discusses principles for writing code that is easy to understand and maintain. It recommends using appropriate naming conventions that clearly convey meaning; consistent formatting and layout to improve aesthetics; comments to explain intent and issues; simplifying loops, logic and expressions; minimizing global variables and scope; and refactoring code into well-organized modules. The goal is to reduce frustration and time for future maintainers to understand the code.
The document contains 20 practice exercises involving Java programming concepts like variables, data types, operators, methods, and control flow. The exercises include writing code to calculate mathematical expressions, convert between temperature scales, find averages, and determine output based on different logical and relational expressions. Sample code is provided and students are asked questions to test their understanding of Java syntax and program execution order.
ODSC 2019: Sessionisation via stochastic periods for root event identificationKuldeep Jiwani
In todays world majority of information is generated by self sustaining systems like various kinds of bots, crawlers, servers, various online services, etc. This information is flowing on the axis of time and is generated by these actors under some complex logic. For example, a stream of buy/sell order requests by an Order Gateway in financial world, or a stream of web requests by a monitoring / crawling service in the web world, or may be a hacker's bot sitting on internet and attacking various computers. Although we may not be able to know the motive or intention behind these data sources. But via some unsupervised techniques we can try to infer the pattern or correlate the events based on their multiple occurrences on the axis of time. Associating a chain of events in order of time helps in doing a root event analysis. In certain cases a time ordered correlation and root event identification is good enough to automatically identify signatures of various malicious actors and take appropriate corrective actions to stop cyber attacks, stop malicious social campaigns, etc.
Sessionisation is one such unsupervised technique that tries to find the signal in a stream of events associated with a timestamp. In the ideal world it would resolve to finding periods with a mixture of sinusoidal waves. But for the real world this is a much complex activity, as even the systematic events generated by machines over the internet behave in a much erratic manner. So the notion of a period for a signal also changes in the real world. We can no longer associate it with a number, it has to be treated as a random variable, with expected values and associated variance. Hence we need to model "Stochastic periods" and learn their probability distributions in an unsupervised manner.
The main focus of this talk will be to showcase applied data science techniques to discover stochastic periods. There are many ways to obtain periods in data, so the journey would begin by a walk through of existing techniques like FFT (Fast Fourier Transform) then discuss about Gaussian Mixture Models. After highlighting the short comings of these techniques we will succinctly explain one of the most general non-parametric Bayesian approaches to solve this problem. Without going too deep in the complex math, we will get back to applied data science and discuss a much simpler technique that can solve the same problem if certain assumptions are satisfied.
In this talk we will demonstrate some time based pattern we discovered while working on a security analytics use case that uses Sessionisation. In the talk we will demonstrate such patterns based on an open source malware attack datasets that is available publicly.
Key concepts explained in talk: Sessionisation, Bayesian techniques of Machine Learning, Gaussian Mixture Models, Kernel density estimation, FFT, stochastic periods, probabilistic modelling, Bayesian non-parametric methods
The document discusses several "outrageous ideas" for improving graph databases, such as using a column-oriented storage model inspired by relational databases, employing worst-case optimal join algorithms, adopting a semantic query optimizer informed by mathematical concepts, and leveraging recursion to enable queries over paths in graph structures. The presentation argues that current graph database implementations are flawed and lessons from relational databases have not been adequately applied.
An efficient map-reduce algorithm is presented for computing formal concepts from binary datasets in a single iteration. The algorithm first uses map-reduce to generate a sufficient set of concepts that can be used to enumerate the entire lattice of formal concepts. It then processes the reduced output on a single machine to generate the sufficient set. Finally, it selectively enumerates all formal concepts in the lattice by using the sufficient set, which avoids computing the entire lattice. This approach improves efficiency over previous algorithms that required multiple map-reduce iterations or sequential processing of the entire lattice.
I'll found many papers and books talking about category theory, but many peoples still don't know how it can help. On this talk I'll help you better understand how math can help us develop a software more composable.
Coder on Beer - Concrete
2018 - São Paulo
The document discusses input/output organization in computer systems. It describes peripheral devices like monitors, keyboards, printers, and storage devices that are connected to computers. It then explains the need for input/output interfaces to handle differences in signal values, timing, data formats, and operating modes between the CPU and peripherals. Common interface types include serial and parallel interfaces. The document outlines techniques for synchronous and asynchronous data transfer, including the use of handshaking protocols to ensure reliable communication between devices. It provides examples of specific interface chips like the 8251 serial interface adapter.
A Signature Algorithm Based On Chaotic Maps And Factoring ProblemsSandra Long
This document describes a new digital signature algorithm based on chaotic maps and factorization problems. It consists of three main phases:
1) System initialization which defines parameters like cryptographic hash function, large prime numbers p and q, element a of order n in GF(p), and multiplicative group G generated by a.
2) Key generation where the signer selects private keys d, x and computes public keys e, y using chaotic maps and modular arithmetic.
3) Signature generation where the signer selects a random number r, computes intermediate values using chaotic maps and factorization, and outputs the signature (v1, v2, S) for the hashed message. The security relies on the difficulty of simultaneously solving
The UK's fastest growing AI & Data career accelerator program can be summarized in 3 sentences:
The AiCore Programme offers software engineering, data science, data analysis, data engineering and machine learning specializations. It provides over 500 hours of coding experience, an internal job board for top industry roles, and support to become an irresistible candidate for your specialist career. Common professions for alumni include data analyst, data engineer, machine learning engineer, and software developer.
Deductive verification of unmodified Linux kernel library functionsDenis Efremov
This document discusses deductive verification of unmodified Linux kernel library functions using Frama-C, AstraVer, and Why3 tools. 26 library functions were verified, with 25 being fully proved. Issues encountered included integer overflows, casts to smaller types, and pointer arithmetic on different memory blocks. Specifications were published online along with proof artifacts. Future work includes an "lemma functions" extension to Frama-C to support more automated verification.
Automatic Selection of Predicates for Common Sense Knowledge Expression長岡技術科学大学 自然言語処理研究室
Ai Makabi, Hiroshi Matsumoto and Kazuhide Yamamoto. Automatic Selection of Predicates for Common Sense Knowledge Expression. Proceedings of the Conference of the Pacific Association for Computational Linguistics (PACLING 2013), no page numbers (2013.9)
The document discusses key topics in software engineering including software products, product attributes, the importance of product characteristics, the software engineering process, engineering process models, software process models, and the advantages and problems of different process models. It introduces these topics and provides some brief explanations about each one.
This document provides sample placement questions from i2 Technologies along with their solutions. It also describes a sample problem about merging two schools with different number of classes and sections into a new school while keeping students of the same class and section together. Finally, it provides links to download additional placement papers and interview questions from other companies on the ITtestPapers website.
1. The document contains a sample paper for a Computer Science subject exam with 7 questions covering topics like operating system functions, data types, loops, conditional statements, functions etc. It provides the expected answers for the questions.
2. The questions assess students' understanding of basic programming concepts like data types, operators, loops, functions etc. and their ability to write simple programs to solve problems like checking leap year, Armstrong number, Fibonacci series etc.
3. The answer key provides concise yet comprehensive responses covering the key aspects being tested in each question like explaining different loop types, data types, debugging, functions of header files etc. and includes code snippets for programming questions.
This document proposes an online planning system using fuzzy logic to control the motion of a serial manipulator robot with multiple links. The system consists of separate fuzzy logic controllers for each link. Each fuzzy controller uses inputs like position error and current joint position to determine the change in joint position needed to move the end effector towards the desired goal position. Simulation results showed the robot could successfully reach the goal point with minimum error in path planning. Membership functions and rules were designed for each fuzzy controller based on the robot kinematics model. The center of gravity defuzzification method was used to determine the crisp output from each fuzzy controller.
The document discusses 10 important C programming interview questions. It provides detailed solutions to questions such as swapping two variables without a temporary variable, solving the 8 queens problem, printing a matrix helically, reversing words in a sentence in-place, generating permutations, and calculating the factorial of a number recursively. For each question, it explains the algorithm and provides sample C code to implement the solution.
This document contains a 25 question multiple choice quiz about computer science topics. The questions cover topics like Boolean algebra, graphs, probability, algorithms, automata theory, programming languages, operating systems, computer networks, databases and software engineering. The majority of the questions can be answered by applying basic concepts from these topics at an undergraduate level.
This document contains a 25 question multiple choice quiz about computer science topics. The questions cover topics like Boolean algebra, graphs, probability, algorithms, automata theory, programming languages, operating systems, computer networks, databases and software engineering. For each question there are 4 possible answer choices, with one being marked as the correct answer. Explanations are provided for some of the questions.
The document discusses ESL (electronic system level) design and some challenges with adopting ESL flows that use C/C++ as a design entry language. It provides examples of how C/C++ code can unintentionally result in inefficient hardware implementations if the designer does not consider the hardware implications. The document advocates that ESL adoption needs to be driven by designer needs and preferences rather than management decisions. It also argues that ESL tools need to provide predictability of results, education for designers on the hardware implications of different coding styles, and robust verification methods for ESL to be widely adopted.
This document contains a GATE study material question paper with 30 single mark questions and 2 two-mark questions on computer science topics. The questions cover concepts like structured programming, data structures, algorithms, computer architecture, operating systems, databases, computer networks and logic. The document also provides information about a website that offers GATE preparation material, forums and downloads.
This document contains an agenda for a presentation on embedded systems. It includes an introduction to embedded systems, why embedded C is used, sample interview questions, and a Q&A section. Some key interview questions cover real-time systems, software testing, pointers, macros, variable scopes, and debugging with tracing. Example code is provided to demonstrate pointers, a macro to set the most significant bit, and a function to find the maximum of two values.
LDCQ paper Dec21 with answer key_62cb2996afc60f6aedeb248c1d9283e5.pdfVedant Gavhane
The document contains 50 multiple choice questions related to C programming concepts like structures, unions, pointers, arrays, functions, strings etc. The key concepts covered include:
- Size of structures and unions is determined by the largest member
- Arrays can be passed to functions by the name of the array
- Pointers store the address of other variables
- Structures allow grouping of different data types under one name
- Strings functions like strcmp() return 0 if the strings are identical
This document contains a 20 question mock exam for the GATE exam. It provides instructions that each question is worth 1 mark, unanswered questions receive 0 marks and incorrect answers receive negative marks. It then lists 20 multiple choice questions related to computer science topics like operating systems, algorithms, data structures, computer networks and formal languages. For each question there are 4 possible answer choices and space to write the answer.
The document contains problem sets and questions related to sequential programming in C++. It includes questions on converting mathematical expressions to C++ expressions, evaluating expressions based on precedence, identifying valid and invalid variable names, tracing output of C++ code snippets, and writing C++ programs to solve problems related to basic calculations, data type conversions, and operations on user input values.
The how-dare-you-call-me-an-idiot’s guide to the .NET Standard (NDC London 2017)citizenmatt
After the initial excitement of .NET Core wore off (Cross platform! Open source!), we realised there were a few things missing. APIs, mostly.
Oh, and compatibility with a lot of your favourite libraries and packages. Fortunately, the .NET Standard is here to fix all of this, adding back APIs, restoring compatibility and even replacing PCLs. This talk is all about the How and the Why, mixed in with a healthy dose of Why Should I Care. We'll even have a little geek out over the technical details. If type forwarding can't restore your excitement levels to fever pitch, I don’t know what will!
(Slides from NDC London 2017)
I'll found many papers and books talking about category theory, but many peoples still don't know how it can help. On this talk I'll help you better understand how math can help us develop a software more composable.
Coder on Beer - Concrete
2018 - São Paulo
The document discusses input/output organization in computer systems. It describes peripheral devices like monitors, keyboards, printers, and storage devices that are connected to computers. It then explains the need for input/output interfaces to handle differences in signal values, timing, data formats, and operating modes between the CPU and peripherals. Common interface types include serial and parallel interfaces. The document outlines techniques for synchronous and asynchronous data transfer, including the use of handshaking protocols to ensure reliable communication between devices. It provides examples of specific interface chips like the 8251 serial interface adapter.
A Signature Algorithm Based On Chaotic Maps And Factoring ProblemsSandra Long
This document describes a new digital signature algorithm based on chaotic maps and factorization problems. It consists of three main phases:
1) System initialization which defines parameters like cryptographic hash function, large prime numbers p and q, element a of order n in GF(p), and multiplicative group G generated by a.
2) Key generation where the signer selects private keys d, x and computes public keys e, y using chaotic maps and modular arithmetic.
3) Signature generation where the signer selects a random number r, computes intermediate values using chaotic maps and factorization, and outputs the signature (v1, v2, S) for the hashed message. The security relies on the difficulty of simultaneously solving
The UK's fastest growing AI & Data career accelerator program can be summarized in 3 sentences:
The AiCore Programme offers software engineering, data science, data analysis, data engineering and machine learning specializations. It provides over 500 hours of coding experience, an internal job board for top industry roles, and support to become an irresistible candidate for your specialist career. Common professions for alumni include data analyst, data engineer, machine learning engineer, and software developer.
Deductive verification of unmodified Linux kernel library functionsDenis Efremov
This document discusses deductive verification of unmodified Linux kernel library functions using Frama-C, AstraVer, and Why3 tools. 26 library functions were verified, with 25 being fully proved. Issues encountered included integer overflows, casts to smaller types, and pointer arithmetic on different memory blocks. Specifications were published online along with proof artifacts. Future work includes an "lemma functions" extension to Frama-C to support more automated verification.
Automatic Selection of Predicates for Common Sense Knowledge Expression長岡技術科学大学 自然言語処理研究室
Ai Makabi, Hiroshi Matsumoto and Kazuhide Yamamoto. Automatic Selection of Predicates for Common Sense Knowledge Expression. Proceedings of the Conference of the Pacific Association for Computational Linguistics (PACLING 2013), no page numbers (2013.9)
The document discusses key topics in software engineering including software products, product attributes, the importance of product characteristics, the software engineering process, engineering process models, software process models, and the advantages and problems of different process models. It introduces these topics and provides some brief explanations about each one.
This document provides sample placement questions from i2 Technologies along with their solutions. It also describes a sample problem about merging two schools with different number of classes and sections into a new school while keeping students of the same class and section together. Finally, it provides links to download additional placement papers and interview questions from other companies on the ITtestPapers website.
1. The document contains a sample paper for a Computer Science subject exam with 7 questions covering topics like operating system functions, data types, loops, conditional statements, functions etc. It provides the expected answers for the questions.
2. The questions assess students' understanding of basic programming concepts like data types, operators, loops, functions etc. and their ability to write simple programs to solve problems like checking leap year, Armstrong number, Fibonacci series etc.
3. The answer key provides concise yet comprehensive responses covering the key aspects being tested in each question like explaining different loop types, data types, debugging, functions of header files etc. and includes code snippets for programming questions.
This document proposes an online planning system using fuzzy logic to control the motion of a serial manipulator robot with multiple links. The system consists of separate fuzzy logic controllers for each link. Each fuzzy controller uses inputs like position error and current joint position to determine the change in joint position needed to move the end effector towards the desired goal position. Simulation results showed the robot could successfully reach the goal point with minimum error in path planning. Membership functions and rules were designed for each fuzzy controller based on the robot kinematics model. The center of gravity defuzzification method was used to determine the crisp output from each fuzzy controller.
The document discusses 10 important C programming interview questions. It provides detailed solutions to questions such as swapping two variables without a temporary variable, solving the 8 queens problem, printing a matrix helically, reversing words in a sentence in-place, generating permutations, and calculating the factorial of a number recursively. For each question, it explains the algorithm and provides sample C code to implement the solution.
This document contains a 25 question multiple choice quiz about computer science topics. The questions cover topics like Boolean algebra, graphs, probability, algorithms, automata theory, programming languages, operating systems, computer networks, databases and software engineering. The majority of the questions can be answered by applying basic concepts from these topics at an undergraduate level.
This document contains a 25 question multiple choice quiz about computer science topics. The questions cover topics like Boolean algebra, graphs, probability, algorithms, automata theory, programming languages, operating systems, computer networks, databases and software engineering. For each question there are 4 possible answer choices, with one being marked as the correct answer. Explanations are provided for some of the questions.
The document discusses ESL (electronic system level) design and some challenges with adopting ESL flows that use C/C++ as a design entry language. It provides examples of how C/C++ code can unintentionally result in inefficient hardware implementations if the designer does not consider the hardware implications. The document advocates that ESL adoption needs to be driven by designer needs and preferences rather than management decisions. It also argues that ESL tools need to provide predictability of results, education for designers on the hardware implications of different coding styles, and robust verification methods for ESL to be widely adopted.
This document contains a GATE study material question paper with 30 single mark questions and 2 two-mark questions on computer science topics. The questions cover concepts like structured programming, data structures, algorithms, computer architecture, operating systems, databases, computer networks and logic. The document also provides information about a website that offers GATE preparation material, forums and downloads.
This document contains an agenda for a presentation on embedded systems. It includes an introduction to embedded systems, why embedded C is used, sample interview questions, and a Q&A section. Some key interview questions cover real-time systems, software testing, pointers, macros, variable scopes, and debugging with tracing. Example code is provided to demonstrate pointers, a macro to set the most significant bit, and a function to find the maximum of two values.
LDCQ paper Dec21 with answer key_62cb2996afc60f6aedeb248c1d9283e5.pdfVedant Gavhane
The document contains 50 multiple choice questions related to C programming concepts like structures, unions, pointers, arrays, functions, strings etc. The key concepts covered include:
- Size of structures and unions is determined by the largest member
- Arrays can be passed to functions by the name of the array
- Pointers store the address of other variables
- Structures allow grouping of different data types under one name
- Strings functions like strcmp() return 0 if the strings are identical
This document contains a 20 question mock exam for the GATE exam. It provides instructions that each question is worth 1 mark, unanswered questions receive 0 marks and incorrect answers receive negative marks. It then lists 20 multiple choice questions related to computer science topics like operating systems, algorithms, data structures, computer networks and formal languages. For each question there are 4 possible answer choices and space to write the answer.
The document contains problem sets and questions related to sequential programming in C++. It includes questions on converting mathematical expressions to C++ expressions, evaluating expressions based on precedence, identifying valid and invalid variable names, tracing output of C++ code snippets, and writing C++ programs to solve problems related to basic calculations, data type conversions, and operations on user input values.
Similar to How to Parse a File (DDD North 2017) (20)
The how-dare-you-call-me-an-idiot’s guide to the .NET Standard (NDC London 2017)citizenmatt
After the initial excitement of .NET Core wore off (Cross platform! Open source!), we realised there were a few things missing. APIs, mostly.
Oh, and compatibility with a lot of your favourite libraries and packages. Fortunately, the .NET Standard is here to fix all of this, adding back APIs, restoring compatibility and even replacing PCLs. This talk is all about the How and the Why, mixed in with a healthy dose of Why Should I Care. We'll even have a little geek out over the technical details. If type forwarding can't restore your excitement levels to fever pitch, I don’t know what will!
(Slides from NDC London 2017)
.NET Core Blimey! Windows Platform User Group, Manchestercitizenmatt
This document discusses .NET Core, a new open source, cross-platform version of .NET. It is not a new version of the .NET Framework, but rather a fork intended to be modular and optimized for cloud deployments. Key aspects include using NuGet for distribution, targeting multiple platforms like Linux and Mac, and using .NET Standard to define a common set of APIs across platforms. It aims to improve on Portable Class Libraries and make .NET more cross-platform.
.NET Core Blimey! (Shropshire Devs Mar 2016)citizenmatt
This document discusses .NET Core, a new open source, cross-platform version of .NET. It is not a new version of the .NET Framework, but rather a fork that is being merged back. .NET Core includes CoreCLR (the runtime), CoreFX (the base class library), and tools like the .NET CLI. It uses NuGet for distribution and targets the new .NET Standard platform for cross-platform compatibility. Key goals are running on Linux and Mac as well as Windows, and being optimized for cloud-based applications.
.NET Core Blimey! (dotnetsheff Jan 2016)citizenmatt
.NET Core is a new open source, cross-platform version of .NET that is optimized for cloud-based development scenarios. It consists of a modular runtime (CoreCLR), class libraries (CoreFX), and tools (CLI). .NET Core is not a new version of the .NET Framework, but rather a fork that is being merged back. It aims to provide a consistent development experience across platforms via NuGet packages and the .NET Standard platform specification.
.NET Core is a new open source, cross-platform version of .NET that is optimized for cloud scenarios. It consists of a new runtime called CoreCLR and a new base class library called CoreFX that are modular and package-based. .NET Core is a subset of the .NET Framework and aims to provide independent release cycles for the runtime, base class libraries, and applications. It uses NuGet packages and reference assemblies to provide compatibility while allowing platform-specific implementations.
.NET Core is a new open source, cross-platform version of .NET that is optimized for cloud scenarios. It consists of a modular runtime (CoreCLR) and class libraries (CoreFX) that are distributed as NuGet packages. .NET Core is a subset of the .NET Framework and aims to be compatible, though some APIs may have platform-specific implementations or fallbacks. It uses a new app model and runtime called CoreCLR that allows it to run cross-platform on Linux, MacOS and Windows.
C# 6.0 introduced many new features including Roslyn, a complete rewrite of the .NET compiler that is now open source. It allows hosting compilers in memory and accessing parse trees from the IDE. C# 6.0 language features include auto property initializers, expression-bodied members, null propagation, nameof operator, and await in catch/finally blocks. Roslyn provides benefits like easier maintenance and new compiler-as-a-service capabilities that power features in Visual Studio. C# 7.0 continues enhancing the language with additions like tuples, pattern matching, and non-nullable types.
ReSharper 9 introduces new innovative features, such as Go To Action, filters for code completion and intellisense, and support for regular expressions. This slide accompanies the presentation available at http://blog.jetbrains.com/dotnet/2014/12/12/webinar-recording-and-qa-whats-new-in-resharper-9
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
3. @citizenmatt
Why would we write a parser?
• Speed, efficiency
• Reduce dependencies
• Custom or simple formats
• Things that aren’t files - DSLs
Command line options, HTTP headers, stdout, natural language commands
E.g. YouTrack queries
• When we’re just as interested in the structure of a file
as its contents
15. @citizenmatt
What is a lexer (aka scanner)?
• Performs lexical analysis
Lexical - relating to the words or vocabulary of a language
• Converts a string into a stream of tokens
Identifier, comment, string literal, braces, parentheses, whitespace, etc.
• Tokens are lightweight - typically integer values
(ReSharper uses singleton object instances)
• Parser pattern matches over tokens
Integer or object reference comparisons
17. @citizenmatt
Lexers are a solved problem
Use a lexer generator
lex (1975), flex, CsLex, FsLex, JFLex, etc.
18. @citizenmatt
Anatomy of a lexer input file
User code (e.g. using directives)
%%
directives
set up namespaces, class names, interfaces
declare regex macros
declare states
%%
rules and actions
<state> rule { action }
20. @citizenmatt
How does it work?
• Lexer generates source code
• Rules (regexes) converted into single Finite State Machine
All regexes combined, matched at same time
• Encoded in state transition tables
• Lookup based on state and input char
• Very fast
• Not very maintainable
Seriously
22. @citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
23. @citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
24. @citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
25. @citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
26. @citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
27. @citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
28. @citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
29. @citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
30. @citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
31. @citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
32. @citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
33. @citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
34. @citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
36. Rules: a(b|c)d*e+ and [0-9]+
[0-9]
4
[0-9]
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ [0-9] other
0 m(1) E E E E m(4) E
1 E m(2) m(2) E E E E
2 E E E m(2) m(3) E E
3 a a a a m(3) a a
4 a a a a a m(4) a
37. Rules: a(b|c)d*e+ and [0-9]+
[0-9]
4
[0-9]
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ [0-9] other
0 m(1) E E E E m(4) E
1 E m(2) m(2) E E E E
2 E E E m(2) m(3) E E
3 a a a a m(3) a a
4 a a a a a m(4) a
38. Rules: a(b|c)d*e+ and [0-9]+
[0-9]
4
[0-9]
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ [0-9] other
0 m(1) E E E E m(4) E
1 E m(2) m(2) E E E E
2 E E E m(2) m(3) E E
3 a a a a m(3) a a
4 a a a a a m(4) a
39. Rules: a(b|c)d*e+ and [0-9]+
[0-9]
4
[0-9]
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ [0-9] other
0 m(1) E E E E m(4) E
1 E m(2) m(2) E E E E
2 E E E m(2) m(3) E E
3 a a a a m(3) a a
4 a a a a a m(4) a
40. Rules: a(b|c)d*e+ and [0-9]+
[0-9]
4
[0-9]
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ [0-9] other
0 m(1) E E E E m(4) E
1 E m(2) m(2) E E E E
2 E E E m(2) m(3) E E
3 a a a a m(3) a a
4 a a a a a m(4) a
41. Rules: a(b|c)d*e+ and [0-9]+
[0-9]
4
[0-9]
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ [0-9] other
0 m(1) E E E E m(4) E
1 E m(2) m(2) E E E E
2 E E E m(2) m(3) E E
3 a a a a m(3) a a
4 a a a a a m(4) a
42. Rules: a(b|c)d*e+ and [0-9]+
[0-9]
4
[0-9]
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ [0-9] other
0 m(1) E E E E m(4) E
1 E m(2) m(2) E E E E
2 E E E m(2) m(3) E E
3 a a a a m(3) a a
4 a a a a a m(4) a
44. @citizenmatt
What is a parser?
• Performs syntactic analysis
Verifies and matches syntax of a file
• Pattern matching on stream of tokens from lexer
Can look at token offsets and text, too
• Syntax is described by a grammar
• Grammar is represented as a recursive hierarchy of rules
Top level is the whole file, composing down to structures and tokens
47. @citizenmatt
Types of parsers
• Top down/recursive descent
Match the root of the tree, recursively split up into child elements
• Bottom up/recursive ascent
Start with matching the leaves of the tree, combine into larger
constructs as you go
50. @citizenmatt
Building a parser
• Hand rolled
Mechanical process to build. Easy to understand
Usually top down/recursive descent
Can use grammar to build syntax tree classes
• Parser generators
yacc/bison, ANTLR, etc.
Usually bottom up. Can be hard to debug - table driven
• ReSharper mostly uses top-down procedural parsers
Generated and hand rolled
Mainly historical. Easier to maintain, easier error recovery, etc.
58. @citizenmatt
Which doesn’t match the grammar
shaderBlock:
SHADER_KEYWORD
STRING_LITERAL
LBRACE
…
RBRACE
;
SHADER_KEYWORD
NEW_LINE
WHITESPACE
STRING_LITERAL
NEW_LINE
WHITESPACE
NEW_LINE
COMMENT
NEW_LINE
LBRACE
WHITESPACE
…
WHITESPACE
RBRACE
59. @citizenmatt
• Filter whitespace and comments from the stream of tokens
ReSharper’s tokens have IsFiltered property
• Decorator pattern
Wrap original lexer, swallow filtered tokens
Filtering lexers
Filtering
lexer
Lexer
Parser
Program
structure
61. @citizenmatt
IDE requirements, Part 1
• Code editor features
Syntax highlighting, code folding, etc.
• Syntax error highlighting
• Inspections
• Refactoring
• Formatting
• Etc.
62. @citizenmatt
IDE requirements, Part 1
• Need to work with the contents and structure of a file
• Contents give us semantic information
• Structure allows us to report inspections, refactor, etc.
Map the semantics back to the file
• Need to represent the structure of the file
• Syntax tree is obvious choice
Inspections walk the tree, refactorings rewrite the tree
66. @citizenmatt
Back to Filtering Lexers
• If we filter tokens out, we have to add them back again
• We need a Missing Tokens Inserter to add whitespace
and comments back into parse tree
Filtering
lexer
Lexer
Parser
Concrete
parse tree
Missing
tokens
inserter
67. @citizenmatt
Missing Tokens Inserter
• Walk leaf elements of tree
Tokens
• Advances (cached) lexer for each leaf element
• Check current lexer token has same offset as leaf
element
• If not, create leaf element and insert into tree
69. @citizenmatt
How do we parse this?
There are no end of scope markers!
And we’ve filtered out the whitespace!
let ArraySample() =
let numLetters = 26
let results = Array.create numLetters 0
let data = "The quick brown fox"
for i = 0 to data.Length - 1 do
let c = data.Chars(i)
let c = Char.ToUpper(c)
if c >= 'A' && c <= 'Z' then
let i = Char.code c - Char.code 'A'
results.[i] <- results.[i] + 1
printf "done!n"
70. @citizenmatt
Insert zero-width tokens
• Another lexer decorator
• Keeps track of whitespace before it’s filtered
• Inserts “invisible” tokens into token stream
indicating indent/outdent or block start/end
Possibly also token to indicate invalid indentation
• Token is zero-width. Doesn’t affect parse tree
• Parser can match these invisible tokens in grammar
72. @citizenmatt
Altering tokens
• F# example: 2. and [2..0] ambiguous
• Original lexer matches 2. as FLOAT
and 2.. as INT_DOT_DOT
• Another lexer decorator
Augment generated rules with custom code
• Decorator recognises INT_DOT_DOT
Splits into two tokens for parser
73. @citizenmatt
When regexes aren’t enough
• ShaderLab nested comments
• Not possible to match with regex
Don’t even try
• Rule to match start of comment - /*
Finish lexing by hand, counting start and end comment chars
Ignore START_COMMENT and return different token - COMMENT
• It doesn’t have to be completely machine generated
/* This /* is */ valid */
75. @citizenmatt
Pre-processor tokens
• Pre-processor tokens can
appear anywhere
• How do you add them to
the grammar/parser?
• ShaderLab has CGPROGRAM
and CGINCLUDE which are
essentially pre-processor
tokens
• (Also nested language - Cg)
76. @citizenmatt
Parsing pre-processor tokens
• Two pass parsing
• First pass parses pre-processor tokens
• Filtering lexer strips pre-processor tokens
• Parse normally
• Parsed pre-processor tree nodes inserted as missing
tokens
79. @citizenmatt
IDE Requirements, Part 2
• Error highlighting
The code is broken every time you type
• Incremental lexing + parsing
Performance
• Version tolerance
E.g. multiple versions of C#
• Nested/composable languages
83. @citizenmatt
What happens when there’s an error?
• The parser adds an error element into the tree
• Error element spans whatever has been parsed so far
Might just be unexpected token, or incorrect element construct
• Highlighting the error in the editor is trivial
Inspection simply looks for error element, adds highlight
84. @citizenmatt
How do we find an error?
• Error start is obvious
mismatched rule, unexpected token
• Where does the error stop?
Off by one token could affect rest of file
• IDE must try to recover
How?
85. @citizenmatt
Error recovery
• Panic mode
Eat tokens until finds a “follows” token
• Token insertion/removal/substitution
• Error rules in grammar
89. @citizenmatt
Error production rules
• Create a rule that anticipates an error
• E.g. consume any tokens that shouldn’t be there
emptyBlock:
LBRACE
errorElementWithoutRBrace*
RBRACE
;
91. @citizenmatt
What’s the problem?
• Don’t parse entire file on every change
• Only reparse smallest subtree that encloses change
Block nodes (method bodies, classes, etc. Not if, for, etc.)
• Avoid re-lexing the entire file, too
92. @citizenmatt
Incremental lexing
• Requires a cache of the original token stream
Token type, offsets and state of lexer (int)
• Copy cached tokens up to change position
• Restart lexer at change position with known state from
cache
• Lex until we can match tail of cached tokens
93. @citizenmatt
Incremental parsing
• Walk up syntax tree, find nearest element that can
reparse and that encompasses change
E.g. method/class body
• Find start of block
E.g. opening LBRACE ‘{‘
• Use updated cached lexer to find end of block
E.g. closing RBRACE ‘}’
• Parse block, add new element into tree
Uses custom entry point into parser
95. @citizenmatt
Three types
• Injected languages
E.g. self-contained islands in a string literal (regex)
• Inherited languages
E.g. TypeScript is a superset of JavaScript
• Nested languages
E.g. JavaScript/CSS nested inside HTML. Razor and C#
96. @citizenmatt
Injected languages
• Build a parse tree for the contents of another node
E.g. ShaderLab CG_PROGRAM, regular expressions, …
• Provides syntax highlighting, code completion, etc.
• Attaches a new parse tree to the node of another tree
• Changes to injected tree persisted to string and pushed
as change to the owning tree
• Changes to owning tree cause full reparse of injected
language
97. @citizenmatt
Inherited languages
• E.g. TypeScript is a superset of JavaScript
• TypeScriptParser derives from JavaScriptParser
Share a lexer
• Custom hand rolled parsers
Recursive descent
• Easier to inherit and override key methods
Gang of Four Template pattern
• Also XamlParser, MSBuildParser, WebConfigParser
Custom XML parsers
98. @citizenmatt
Nested languages
• E.g. .aspx, .cshtml - HTML superset, with C# “islands”
• ReSharper parses .aspx/.cshtml file
Builds parse tree for ASPX/Razor syntax
• HTML superset requires lexer superset
• HtmlCompoundLexer lexes “outer” language’s tokens
When encounters HTML, switches to standard HTML lexer
• How to handle C# islands?
99. @citizenmatt
Secondary documents
• ASPX/Razor - C# islands
• Create secondary in-memory C# file
Mirrors what gets generated when .aspx file is compiled
• Maps C# islands in .aspx to in-memory C# file
• Inspections, code completion, etc. work through the
mapping