SlideShare a Scribd company logo
1 of 22
String Parsing
Michael Heron
Introduction
• Having got a string in your system, how do you manipulate it?
• Strings are fundamental forms of data representation.
• Often obtained from text-files and user input.
• Most strings are not in an easily managed form.
• The process of parsing is used to render raw data into more
refined forms.
Parsing
• There are many reasons why we may wish to parse data.
• Information comes in as a string – we want it in an array.
• Information comes in as lists of string numbers, we want them in
objects
• We are rarely so lucky as to be able to instantly manipulate
data that comes in to the system.
Data Representation
• The absolute most important thing in designing a program is
to represent your data right.
• If you get this right, everything is easier as a result.
• If you get it wrong, everything is more difficult.
• Before you ever write a line of code, consider how data must
be represented in the system.
• What variables, objects and arrays are you going to use?
Data Representation
• Consider how you are going to need to manipulate the data in
the system.
• Are you going to need to be able to search through things?
• Are you going to need to process each value in turn?
• Are you going to need to represent relationships between things?
• An easily manipulated data structure is worth its weight in
gold.
Parsing
• Parsing is the process of turning difficult to manipulate data
into a more useful format.
• Break strings up into all their constituent parts
• Convert from multiple arrays into an array of objects
• Important first step before more complex processing.
• Various standard techniques exist to facilitate this.
Common Parsing Tasks
• Tokenization
• Turn a string into several smaller strings through the use of
tokens
• Object processing
• Breaking multiple data fields out of a single string and configuring
an object
• Data conversion
• Bringing data elements into some common format
• Often necessary to combine different processes.
Tokenization
• Tokenization is the process of splitting up strings.
• Based on the idea of a delimiter.
• Strings that have a common, delimited structure are amenable
to tokenization.
• 10,20,30,40
• Jim,Jake,Jane,Johana
• Strings are broken up based on the delimiter and the result is
an array of strings.
Object Processing
• Object processing involves the creation of a ‘blank’ object and
setting its attributes as a result of input.
• Often done after tokenization of input.
• The end result is an object configured as desired.
• One way to handle persisting objects in files.
• May be repeated.
• Create an array of appropriately configured objects.
Data Conversion
• As a result of parsing, can take the time to convert data into
more appropriate representations.
• After pulling numbers in from a file, they’re usually stored as
strings.
• Can use various conversion functions to clean up representation.
• atoi, as an example
• Can convert from rough representations to more precise
representations.
Example
• Consider the following example scenario – calculate the Flesch
Readability index of a document.
• Need to determine:
• Number of sentences
• Number of words
• Number of syllables in words
• Read in as a string from a text file.
• Must be parsed.
The Hard Way
• Can manipulate a string directly.
• Count spaces in a string.
• That gives word count, roughly
• Count full stops in a string
• That gives the number of sentences
• Syllable count?
• Uh…
• Horrors upon horrors
• Must parse to get a structure amenable to processing.
• An array of strings.
String Processing
• Strings contain many useful functions for handling such
parsing.
• find function gives the location of a particular character.
#include <iostream>
using namespace std;
int main() {
string str = "Hello World";
int index;
index = str.find ("e", 0);
cout << "Found at: " << index << endl;
}
String Processing
• Can use the substr function of a string to extract a substring
from a full string:
#include <iostream>
using namespace std;
int main() {
string str = "Hello World";
string sub = str.substr (0, 5);
cout << "Substring is: " << sub << endl;
}
Working With Strings
• Strings also contain a very useful length function.
• This tells you how many characters they contain.
• Also possible to index a string just like an array.
• This lets you get individual characters out of a string.
• Can combine these into powerful functions.
Tokenization
#include <iostream>
using namespace std;
int main() {
string arr[100];
int size;
string sub;
string str = "Snausages are snausages for snausages";
int start;
size = 0;
start = 0;
for (int i = 0; i < str.length(); i++) {
if (str[i] != ' ' && i != str.length() - 1) {
continue;
}
sub = str.substr (start, i-start);
arr[size] = sub;
start = i+1;
size += 1;
}
}
Tokenization
• There are other ways to tokenize.
• This is just one way to show the power of string manipulation.
• Serves as a basis for more complex data parsing.
• Important to be able to do this – all program representation
breaks down into parsing at some point or another.
Object Representation
• Can combine tokenization with object representation.
• Tokenize individual elements.
• Convert them to appropriate data format.
• Use accessor methods on an object to configure.
• Can easily set up large amounts of objects with this kind of
system.
• Combine the objects in an array for the best of both worlds.
This Is The End…
• With that, it brings us to the end of the scheduled content for
C++.
• Cheer / Cry as you feel is appropriate.
• Next week, we’ll use the time as consolidation time.
• Thursday lecture will be a formal revision lecture covering all the
topics we have previously met.
• Wednesday lecture/tutorial will be a drop in revision session. No
planned content, come along with whatever questions you have.
Some Final Thoughts
• Programming is hard.
• I did warn you at the start!
• It’s also a very rare and valuable skill
• Which you are moving towards properly building.
• It is a skill that requires training.
• Like playing a musical instrument or fighting off ninjas.
• Important not to let it slide.
Some Final Thoughts
• It’s worthwhile keeping a notebook of ‘things I wish I had software
to do’.
• It can serve as a basis for further exploration of programming.
• Don’t worry if you don’t know how to do the things.
• Research is a constant part of programming. Nobody knows how to
do everything.
• Stretching yourself by setting tasks you don’t know how to do is a
great way to learn.
• Even if you never complete it, the process is valuable.
Summary
• Parsing is an important part of software development.
• It helps you turn unstructured data into structured data.
• Comes in many forms.
• String parsing is the most immediately useful of these.
• Tokenization is a key parsing technique.
• Worth playing about with.

More Related Content

What's hot

Knowledge based System
Knowledge based SystemKnowledge based System
Knowledge based SystemTamanna36
 
Latent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalLatent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalSudarsun Santhiappan
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectorsSimon Hughes
 
Java input output package
Java input output packageJava input output package
Java input output packageSujit Kumar
 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlpankit_ppt
 
Tree(Directed and undirected tree)
Tree(Directed and undirected tree)Tree(Directed and undirected tree)
Tree(Directed and undirected tree)Mahmoud Hikmet
 
Contextual Definition Generation
Contextual Definition GenerationContextual Definition Generation
Contextual Definition GenerationSergey Sosnovsky
 
C++ in object oriented programming
C++ in object oriented programmingC++ in object oriented programming
C++ in object oriented programmingSaket Khopkar
 
Machine Learning Applied - Contextual Chatbots Coding, Oracle JET and Tensor...
 Machine Learning Applied - Contextual Chatbots Coding, Oracle JET and Tensor... Machine Learning Applied - Contextual Chatbots Coding, Oracle JET and Tensor...
Machine Learning Applied - Contextual Chatbots Coding, Oracle JET and Tensor...andrejusb
 

What's hot (17)

Knowledge based System
Knowledge based SystemKnowledge based System
Knowledge based System
 
[ppt]
[ppt][ppt]
[ppt]
 
Tree
TreeTree
Tree
 
An Introduction To Python - Files, Part 1
An Introduction To Python - Files, Part 1An Introduction To Python - Files, Part 1
An Introduction To Python - Files, Part 1
 
Database management system session 6
Database management system session 6Database management system session 6
Database management system session 6
 
NAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITIONNAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITION
 
Latent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalLatent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information Retrieval
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectors
 
Java input output package
Java input output packageJava input output package
Java input output package
 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlp
 
Sy ds -I
Sy ds -ISy ds -I
Sy ds -I
 
Tree(Directed and undirected tree)
Tree(Directed and undirected tree)Tree(Directed and undirected tree)
Tree(Directed and undirected tree)
 
Tree
TreeTree
Tree
 
Contextual Definition Generation
Contextual Definition GenerationContextual Definition Generation
Contextual Definition Generation
 
C++ in object oriented programming
C++ in object oriented programmingC++ in object oriented programming
C++ in object oriented programming
 
Word2 vec
Word2 vecWord2 vec
Word2 vec
 
Machine Learning Applied - Contextual Chatbots Coding, Oracle JET and Tensor...
 Machine Learning Applied - Contextual Chatbots Coding, Oracle JET and Tensor... Machine Learning Applied - Contextual Chatbots Coding, Oracle JET and Tensor...
Machine Learning Applied - Contextual Chatbots Coding, Oracle JET and Tensor...
 

Viewers also liked

Text parser based interaction
Text parser based interactionText parser based interaction
Text parser based interactionMichael Heron
 
ACCESS: A Technical Framework for Adaptive Accessibility Support
ACCESS:  A Technical Framework for Adaptive Accessibility SupportACCESS:  A Technical Framework for Adaptive Accessibility Support
ACCESS: A Technical Framework for Adaptive Accessibility SupportMichael Heron
 
Web Services: Encapsulation, Reusability, and Simplicity
Web Services: Encapsulation, Reusability, and SimplicityWeb Services: Encapsulation, Reusability, and Simplicity
Web Services: Encapsulation, Reusability, and Simplicityhannonhill
 
2CPP07 - Inheritance
2CPP07 - Inheritance2CPP07 - Inheritance
2CPP07 - InheritanceMichael Heron
 
2CPP01 - Intro to Module
2CPP01 - Intro to Module2CPP01 - Intro to Module
2CPP01 - Intro to ModuleMichael Heron
 
2CPP13 - Operator Overloading
2CPP13 - Operator Overloading2CPP13 - Operator Overloading
2CPP13 - Operator OverloadingMichael Heron
 
2CPP05 - Modelling an Object Oriented Program
2CPP05 - Modelling an Object Oriented Program2CPP05 - Modelling an Object Oriented Program
2CPP05 - Modelling an Object Oriented ProgramMichael Heron
 
2CPP03 - Object Orientation Fundamentals
2CPP03 - Object Orientation Fundamentals2CPP03 - Object Orientation Fundamentals
2CPP03 - Object Orientation FundamentalsMichael Heron
 
2CPP11 - Method Overloading
2CPP11 - Method Overloading2CPP11 - Method Overloading
2CPP11 - Method OverloadingMichael Heron
 
2CPP10 - Polymorphism
2CPP10 - Polymorphism2CPP10 - Polymorphism
2CPP10 - PolymorphismMichael Heron
 
Authorship and Autership
Authorship and AutershipAuthorship and Autership
Authorship and AutershipMichael Heron
 

Viewers also liked (20)

Text parser based interaction
Text parser based interactionText parser based interaction
Text parser based interaction
 
SAD04 - Inheritance
SAD04 - InheritanceSAD04 - Inheritance
SAD04 - Inheritance
 
2CPP16 - STL
2CPP16 - STL2CPP16 - STL
2CPP16 - STL
 
ACCESS: A Technical Framework for Adaptive Accessibility Support
ACCESS:  A Technical Framework for Adaptive Accessibility SupportACCESS:  A Technical Framework for Adaptive Accessibility Support
ACCESS: A Technical Framework for Adaptive Accessibility Support
 
Web Services: Encapsulation, Reusability, and Simplicity
Web Services: Encapsulation, Reusability, and SimplicityWeb Services: Encapsulation, Reusability, and Simplicity
Web Services: Encapsulation, Reusability, and Simplicity
 
2CPP07 - Inheritance
2CPP07 - Inheritance2CPP07 - Inheritance
2CPP07 - Inheritance
 
2CPP01 - Intro to Module
2CPP01 - Intro to Module2CPP01 - Intro to Module
2CPP01 - Intro to Module
 
2CPP13 - Operator Overloading
2CPP13 - Operator Overloading2CPP13 - Operator Overloading
2CPP13 - Operator Overloading
 
ofdm
ofdmofdm
ofdm
 
2CPP18 - Modifiers
2CPP18 - Modifiers2CPP18 - Modifiers
2CPP18 - Modifiers
 
2CPP05 - Modelling an Object Oriented Program
2CPP05 - Modelling an Object Oriented Program2CPP05 - Modelling an Object Oriented Program
2CPP05 - Modelling an Object Oriented Program
 
CPP15 - Inheritance
CPP15 - InheritanceCPP15 - Inheritance
CPP15 - Inheritance
 
2CPP15 - Templates
2CPP15 - Templates2CPP15 - Templates
2CPP15 - Templates
 
2CPP03 - Object Orientation Fundamentals
2CPP03 - Object Orientation Fundamentals2CPP03 - Object Orientation Fundamentals
2CPP03 - Object Orientation Fundamentals
 
2CPP11 - Method Overloading
2CPP11 - Method Overloading2CPP11 - Method Overloading
2CPP11 - Method Overloading
 
2CPP10 - Polymorphism
2CPP10 - Polymorphism2CPP10 - Polymorphism
2CPP10 - Polymorphism
 
Authorship and Autership
Authorship and AutershipAuthorship and Autership
Authorship and Autership
 
CPP17 - File IO
CPP17 - File IOCPP17 - File IO
CPP17 - File IO
 
2CPP17 - File IO
2CPP17 - File IO2CPP17 - File IO
2CPP17 - File IO
 
Chapter1 Introduction to OOP (Java)
Chapter1 Introduction to OOP (Java)Chapter1 Introduction to OOP (Java)
Chapter1 Introduction to OOP (Java)
 

Similar to Parsing Strings and Converting Data

Skillwise - Enhancing dotnet app
Skillwise - Enhancing dotnet appSkillwise - Enhancing dotnet app
Skillwise - Enhancing dotnet appSkillwise Group
 
Introduction to Data Structures
Introduction to Data StructuresIntroduction to Data Structures
Introduction to Data StructuresAmar Jukuntla
 
Algorithms and Data Structures
Algorithms and Data StructuresAlgorithms and Data Structures
Algorithms and Data Structuressonykhan3
 
Lecture 01 Intro to DSA
Lecture 01 Intro to DSALecture 01 Intro to DSA
Lecture 01 Intro to DSANurjahan Nipa
 
Anton Dorfman - Reversing data formats what data can reveal
Anton Dorfman - Reversing data formats what data can revealAnton Dorfman - Reversing data formats what data can reveal
Anton Dorfman - Reversing data formats what data can revealDefconRussia
 
CPP02 - The Structure of a Program
CPP02 - The Structure of a ProgramCPP02 - The Structure of a Program
CPP02 - The Structure of a ProgramMichael Heron
 
1. Introduction to Data Structure.pptx
1. Introduction to Data Structure.pptx1. Introduction to Data Structure.pptx
1. Introduction to Data Structure.pptxRahikAhmed
 
Data structure chapter 1.pptx
Data structure chapter 1.pptxData structure chapter 1.pptx
Data structure chapter 1.pptxKami503928
 
Data Structure & aaplications_Module-1.pptx
Data Structure & aaplications_Module-1.pptxData Structure & aaplications_Module-1.pptx
Data Structure & aaplications_Module-1.pptxGIRISHKUMARBC1
 
DATA-STRUCTURES.pptx
DATA-STRUCTURES.pptxDATA-STRUCTURES.pptx
DATA-STRUCTURES.pptxRuchiNagar3
 
b,Sc it data structure.ppt
b,Sc it data structure.pptb,Sc it data structure.ppt
b,Sc it data structure.pptclassall
 
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptxQ-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptxkalai75
 
Automate using Python
Automate using PythonAutomate using Python
Automate using PythonYogeshIngale9
 
b,Sc it data structure.pptx
b,Sc it data structure.pptxb,Sc it data structure.pptx
b,Sc it data structure.pptxclassall
 
Data structure and algorithm.
Data structure and algorithm. Data structure and algorithm.
Data structure and algorithm. Abdul salam
 
Introduction to data structures (ss)
Introduction to data structures (ss)Introduction to data structures (ss)
Introduction to data structures (ss)Madishetty Prathibha
 

Similar to Parsing Strings and Converting Data (20)

CPP19 - Revision
CPP19 - RevisionCPP19 - Revision
CPP19 - Revision
 
dsa.pptx
dsa.pptxdsa.pptx
dsa.pptx
 
Skillwise - Enhancing dotnet app
Skillwise - Enhancing dotnet appSkillwise - Enhancing dotnet app
Skillwise - Enhancing dotnet app
 
CPP12 - Algorithms
CPP12 - AlgorithmsCPP12 - Algorithms
CPP12 - Algorithms
 
Introduction to Data Structures
Introduction to Data StructuresIntroduction to Data Structures
Introduction to Data Structures
 
Algorithms and Data Structures
Algorithms and Data StructuresAlgorithms and Data Structures
Algorithms and Data Structures
 
Lecture 01 Intro to DSA
Lecture 01 Intro to DSALecture 01 Intro to DSA
Lecture 01 Intro to DSA
 
Anton Dorfman - Reversing data formats what data can reveal
Anton Dorfman - Reversing data formats what data can revealAnton Dorfman - Reversing data formats what data can reveal
Anton Dorfman - Reversing data formats what data can reveal
 
Data structure Unit-I Part A
Data structure Unit-I Part AData structure Unit-I Part A
Data structure Unit-I Part A
 
CPP02 - The Structure of a Program
CPP02 - The Structure of a ProgramCPP02 - The Structure of a Program
CPP02 - The Structure of a Program
 
1. Introduction to Data Structure.pptx
1. Introduction to Data Structure.pptx1. Introduction to Data Structure.pptx
1. Introduction to Data Structure.pptx
 
Data structure chapter 1.pptx
Data structure chapter 1.pptxData structure chapter 1.pptx
Data structure chapter 1.pptx
 
Data Structure & aaplications_Module-1.pptx
Data Structure & aaplications_Module-1.pptxData Structure & aaplications_Module-1.pptx
Data Structure & aaplications_Module-1.pptx
 
DATA-STRUCTURES.pptx
DATA-STRUCTURES.pptxDATA-STRUCTURES.pptx
DATA-STRUCTURES.pptx
 
b,Sc it data structure.ppt
b,Sc it data structure.pptb,Sc it data structure.ppt
b,Sc it data structure.ppt
 
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptxQ-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
 
Automate using Python
Automate using PythonAutomate using Python
Automate using Python
 
b,Sc it data structure.pptx
b,Sc it data structure.pptxb,Sc it data structure.pptx
b,Sc it data structure.pptx
 
Data structure and algorithm.
Data structure and algorithm. Data structure and algorithm.
Data structure and algorithm.
 
Introduction to data structures (ss)
Introduction to data structures (ss)Introduction to data structures (ss)
Introduction to data structures (ss)
 

More from Michael Heron

Meeple centred design - Board Game Accessibility
Meeple centred design - Board Game AccessibilityMeeple centred design - Board Game Accessibility
Meeple centred design - Board Game AccessibilityMichael Heron
 
Musings on misconduct
Musings on misconductMusings on misconduct
Musings on misconductMichael Heron
 
Accessibility Support with the ACCESS Framework
Accessibility Support with the ACCESS FrameworkAccessibility Support with the ACCESS Framework
Accessibility Support with the ACCESS FrameworkMichael Heron
 
GRPHICS08 - Raytracing and Radiosity
GRPHICS08 - Raytracing and RadiosityGRPHICS08 - Raytracing and Radiosity
GRPHICS08 - Raytracing and RadiosityMichael Heron
 
GRPHICS07 - Textures
GRPHICS07 - TexturesGRPHICS07 - Textures
GRPHICS07 - TexturesMichael Heron
 
GRPHICS05 - Rendering (2)
GRPHICS05 - Rendering (2)GRPHICS05 - Rendering (2)
GRPHICS05 - Rendering (2)Michael Heron
 
GRPHICS04 - Rendering (1)
GRPHICS04 - Rendering (1)GRPHICS04 - Rendering (1)
GRPHICS04 - Rendering (1)Michael Heron
 
GRPHICS03 - Graphical Representation
GRPHICS03 - Graphical RepresentationGRPHICS03 - Graphical Representation
GRPHICS03 - Graphical RepresentationMichael Heron
 
GRPHICS02 - Creating 3D Graphics
GRPHICS02 - Creating 3D GraphicsGRPHICS02 - Creating 3D Graphics
GRPHICS02 - Creating 3D GraphicsMichael Heron
 
GRPHICS01 - Introduction to 3D Graphics
GRPHICS01 - Introduction to 3D GraphicsGRPHICS01 - Introduction to 3D Graphics
GRPHICS01 - Introduction to 3D GraphicsMichael Heron
 
GRPHICS09 - Art Appreciation
GRPHICS09 - Art AppreciationGRPHICS09 - Art Appreciation
GRPHICS09 - Art AppreciationMichael Heron
 
2CPP14 - Abstraction
2CPP14 - Abstraction2CPP14 - Abstraction
2CPP14 - AbstractionMichael Heron
 
2CPP12 - Method Overriding
2CPP12 - Method Overriding2CPP12 - Method Overriding
2CPP12 - Method OverridingMichael Heron
 
2CPP09 - Encapsulation
2CPP09 - Encapsulation2CPP09 - Encapsulation
2CPP09 - EncapsulationMichael Heron
 
2CPP08 - Overloading and Overriding
2CPP08 - Overloading and Overriding2CPP08 - Overloading and Overriding
2CPP08 - Overloading and OverridingMichael Heron
 
2CPP06 - Arrays and Pointers
2CPP06 - Arrays and Pointers2CPP06 - Arrays and Pointers
2CPP06 - Arrays and PointersMichael Heron
 

More from Michael Heron (17)

Meeple centred design - Board Game Accessibility
Meeple centred design - Board Game AccessibilityMeeple centred design - Board Game Accessibility
Meeple centred design - Board Game Accessibility
 
Musings on misconduct
Musings on misconductMusings on misconduct
Musings on misconduct
 
Accessibility Support with the ACCESS Framework
Accessibility Support with the ACCESS FrameworkAccessibility Support with the ACCESS Framework
Accessibility Support with the ACCESS Framework
 
GRPHICS08 - Raytracing and Radiosity
GRPHICS08 - Raytracing and RadiosityGRPHICS08 - Raytracing and Radiosity
GRPHICS08 - Raytracing and Radiosity
 
GRPHICS07 - Textures
GRPHICS07 - TexturesGRPHICS07 - Textures
GRPHICS07 - Textures
 
GRPHICS06 - Shading
GRPHICS06 - ShadingGRPHICS06 - Shading
GRPHICS06 - Shading
 
GRPHICS05 - Rendering (2)
GRPHICS05 - Rendering (2)GRPHICS05 - Rendering (2)
GRPHICS05 - Rendering (2)
 
GRPHICS04 - Rendering (1)
GRPHICS04 - Rendering (1)GRPHICS04 - Rendering (1)
GRPHICS04 - Rendering (1)
 
GRPHICS03 - Graphical Representation
GRPHICS03 - Graphical RepresentationGRPHICS03 - Graphical Representation
GRPHICS03 - Graphical Representation
 
GRPHICS02 - Creating 3D Graphics
GRPHICS02 - Creating 3D GraphicsGRPHICS02 - Creating 3D Graphics
GRPHICS02 - Creating 3D Graphics
 
GRPHICS01 - Introduction to 3D Graphics
GRPHICS01 - Introduction to 3D GraphicsGRPHICS01 - Introduction to 3D Graphics
GRPHICS01 - Introduction to 3D Graphics
 
GRPHICS09 - Art Appreciation
GRPHICS09 - Art AppreciationGRPHICS09 - Art Appreciation
GRPHICS09 - Art Appreciation
 
2CPP14 - Abstraction
2CPP14 - Abstraction2CPP14 - Abstraction
2CPP14 - Abstraction
 
2CPP12 - Method Overriding
2CPP12 - Method Overriding2CPP12 - Method Overriding
2CPP12 - Method Overriding
 
2CPP09 - Encapsulation
2CPP09 - Encapsulation2CPP09 - Encapsulation
2CPP09 - Encapsulation
 
2CPP08 - Overloading and Overriding
2CPP08 - Overloading and Overriding2CPP08 - Overloading and Overriding
2CPP08 - Overloading and Overriding
 
2CPP06 - Arrays and Pointers
2CPP06 - Arrays and Pointers2CPP06 - Arrays and Pointers
2CPP06 - Arrays and Pointers
 

Recently uploaded

英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 

Recently uploaded (20)

英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 

Parsing Strings and Converting Data

  • 2. Introduction • Having got a string in your system, how do you manipulate it? • Strings are fundamental forms of data representation. • Often obtained from text-files and user input. • Most strings are not in an easily managed form. • The process of parsing is used to render raw data into more refined forms.
  • 3. Parsing • There are many reasons why we may wish to parse data. • Information comes in as a string – we want it in an array. • Information comes in as lists of string numbers, we want them in objects • We are rarely so lucky as to be able to instantly manipulate data that comes in to the system.
  • 4. Data Representation • The absolute most important thing in designing a program is to represent your data right. • If you get this right, everything is easier as a result. • If you get it wrong, everything is more difficult. • Before you ever write a line of code, consider how data must be represented in the system. • What variables, objects and arrays are you going to use?
  • 5. Data Representation • Consider how you are going to need to manipulate the data in the system. • Are you going to need to be able to search through things? • Are you going to need to process each value in turn? • Are you going to need to represent relationships between things? • An easily manipulated data structure is worth its weight in gold.
  • 6. Parsing • Parsing is the process of turning difficult to manipulate data into a more useful format. • Break strings up into all their constituent parts • Convert from multiple arrays into an array of objects • Important first step before more complex processing. • Various standard techniques exist to facilitate this.
  • 7. Common Parsing Tasks • Tokenization • Turn a string into several smaller strings through the use of tokens • Object processing • Breaking multiple data fields out of a single string and configuring an object • Data conversion • Bringing data elements into some common format • Often necessary to combine different processes.
  • 8. Tokenization • Tokenization is the process of splitting up strings. • Based on the idea of a delimiter. • Strings that have a common, delimited structure are amenable to tokenization. • 10,20,30,40 • Jim,Jake,Jane,Johana • Strings are broken up based on the delimiter and the result is an array of strings.
  • 9. Object Processing • Object processing involves the creation of a ‘blank’ object and setting its attributes as a result of input. • Often done after tokenization of input. • The end result is an object configured as desired. • One way to handle persisting objects in files. • May be repeated. • Create an array of appropriately configured objects.
  • 10. Data Conversion • As a result of parsing, can take the time to convert data into more appropriate representations. • After pulling numbers in from a file, they’re usually stored as strings. • Can use various conversion functions to clean up representation. • atoi, as an example • Can convert from rough representations to more precise representations.
  • 11. Example • Consider the following example scenario – calculate the Flesch Readability index of a document. • Need to determine: • Number of sentences • Number of words • Number of syllables in words • Read in as a string from a text file. • Must be parsed.
  • 12. The Hard Way • Can manipulate a string directly. • Count spaces in a string. • That gives word count, roughly • Count full stops in a string • That gives the number of sentences • Syllable count? • Uh… • Horrors upon horrors • Must parse to get a structure amenable to processing. • An array of strings.
  • 13. String Processing • Strings contain many useful functions for handling such parsing. • find function gives the location of a particular character. #include <iostream> using namespace std; int main() { string str = "Hello World"; int index; index = str.find ("e", 0); cout << "Found at: " << index << endl; }
  • 14. String Processing • Can use the substr function of a string to extract a substring from a full string: #include <iostream> using namespace std; int main() { string str = "Hello World"; string sub = str.substr (0, 5); cout << "Substring is: " << sub << endl; }
  • 15. Working With Strings • Strings also contain a very useful length function. • This tells you how many characters they contain. • Also possible to index a string just like an array. • This lets you get individual characters out of a string. • Can combine these into powerful functions.
  • 16. Tokenization #include <iostream> using namespace std; int main() { string arr[100]; int size; string sub; string str = "Snausages are snausages for snausages"; int start; size = 0; start = 0; for (int i = 0; i < str.length(); i++) { if (str[i] != ' ' && i != str.length() - 1) { continue; } sub = str.substr (start, i-start); arr[size] = sub; start = i+1; size += 1; } }
  • 17. Tokenization • There are other ways to tokenize. • This is just one way to show the power of string manipulation. • Serves as a basis for more complex data parsing. • Important to be able to do this – all program representation breaks down into parsing at some point or another.
  • 18. Object Representation • Can combine tokenization with object representation. • Tokenize individual elements. • Convert them to appropriate data format. • Use accessor methods on an object to configure. • Can easily set up large amounts of objects with this kind of system. • Combine the objects in an array for the best of both worlds.
  • 19. This Is The End… • With that, it brings us to the end of the scheduled content for C++. • Cheer / Cry as you feel is appropriate. • Next week, we’ll use the time as consolidation time. • Thursday lecture will be a formal revision lecture covering all the topics we have previously met. • Wednesday lecture/tutorial will be a drop in revision session. No planned content, come along with whatever questions you have.
  • 20. Some Final Thoughts • Programming is hard. • I did warn you at the start! • It’s also a very rare and valuable skill • Which you are moving towards properly building. • It is a skill that requires training. • Like playing a musical instrument or fighting off ninjas. • Important not to let it slide.
  • 21. Some Final Thoughts • It’s worthwhile keeping a notebook of ‘things I wish I had software to do’. • It can serve as a basis for further exploration of programming. • Don’t worry if you don’t know how to do the things. • Research is a constant part of programming. Nobody knows how to do everything. • Stretching yourself by setting tasks you don’t know how to do is a great way to learn. • Even if you never complete it, the process is valuable.
  • 22. Summary • Parsing is an important part of software development. • It helps you turn unstructured data into structured data. • Comes in many forms. • String parsing is the most immediately useful of these. • Tokenization is a key parsing technique. • Worth playing about with.