This document discusses character encodings and provides tips for properly handling encodings in programming. It begins with definitions of characters, scripts, and the need for character sets. It then discusses commonly used character sets like ASCII and Unicode. UTF-8, UTF-16, and UTF-32 encodings are explained as they allow representing all Unicode characters using variable number of bytes. The document concludes with programming language-specific tips and functions for detecting, parsing, and writing encodings in languages like PHP, Java, Objective-C, and C#.
our application is great – and popular. You have translation efforts underway, everything is going well – and wait a minute, what’s the report of strange question mark characters all over the page? Unicode is pain. UTF-32, UTF-16, UTF-8 and then something else is thrown in the mix … Multibyte and codepoints, it all sounds like greek. But it doesn’t have to be so scary. PHP support for Unicode has been improving, even without native unicode string support. Learn the basics of unicode is and how it works, why you would add support for it in your application, how to deal with issues, and the pain points of implementation.
Zoe Slattery's slides from PHPNW08:
The ability to store large quantities of local data means that many applications require some form of text search and retrieval facility. From the point of view of the application developer there are a number of choices to make, the first is whether to use a complete packaged solution or whether to use one of the available information libraries to build a custom information retrieval (IR) solution. In this talk I’ll look at the options for PHP programmers who choose to embed IR facilities within their applications.
For Java programmers there is clearly a good range of options for text retrieval libraries, but options for PHP programmers are more limited. At first sight for a PHP programmer wishing to embed indexing and search facilities in their application, the choice seems obvious - the PHP implementation of Lucene (Zend Search Lucene). There is no requirement to support another language, the code is PHP therefore easy for PHP programmers to work with and the license is commercially friendly. However, whilst ease of integration and support are key factors in choice of technology, performance can also be important; the performance of the PHP implementation of Lucene is poor compared to the Java implementation.
In this talk I’ll explain the differences in performance between PHP implementation of Lucene and the Java implementation and examine the other options available to PHP programmers for whom performance is a critical factor.
our application is great – and popular. You have translation efforts underway, everything is going well – and wait a minute, what’s the report of strange question mark characters all over the page? Unicode is pain. UTF-32, UTF-16, UTF-8 and then something else is thrown in the mix … Multibyte and codepoints, it all sounds like greek. But it doesn’t have to be so scary. PHP support for Unicode has been improving, even without native unicode string support. Learn the basics of unicode is and how it works, why you would add support for it in your application, how to deal with issues, and the pain points of implementation.
Zoe Slattery's slides from PHPNW08:
The ability to store large quantities of local data means that many applications require some form of text search and retrieval facility. From the point of view of the application developer there are a number of choices to make, the first is whether to use a complete packaged solution or whether to use one of the available information libraries to build a custom information retrieval (IR) solution. In this talk I’ll look at the options for PHP programmers who choose to embed IR facilities within their applications.
For Java programmers there is clearly a good range of options for text retrieval libraries, but options for PHP programmers are more limited. At first sight for a PHP programmer wishing to embed indexing and search facilities in their application, the choice seems obvious - the PHP implementation of Lucene (Zend Search Lucene). There is no requirement to support another language, the code is PHP therefore easy for PHP programmers to work with and the license is commercially friendly. However, whilst ease of integration and support are key factors in choice of technology, performance can also be important; the performance of the PHP implementation of Lucene is poor compared to the Java implementation.
In this talk I’ll explain the differences in performance between PHP implementation of Lucene and the Java implementation and examine the other options available to PHP programmers for whom performance is a critical factor.
This course is a complete package that helped me to learn Python Programming from basic to an intermediate level. The course curriculum has been divided into 4 weeks where we can practice questions & attempt the assessment tests according to your own pace. The course offers us a wealth of programming challenges that will help you to prepare for interviews with top-notch companies like Microsoft, Amazon, Adobe etc
Unicode - Hacking The International Character SystemWebsecurify
In this presentation we explore some of the problems of unicode and how they can be used for nefarious purposes in order to exploit a range of critical vulnerabilities including SQL Injection, XSS and many other.
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6Andrei Zmievski
n the halcyon days of early 2005, a project was launched to bring long overdue native Unicode and internationalization support to PHP. It was deemed so far reaching and important that PHP needed to have a version bump. After more than 4 years of development, the project (and PHP 6 for now) was shelved. This talk will introduce Unicode and i18n concepts, explain why Web needs Unicode, why PHP needs Unicode, how we tried to solve it (with examples), and what eventually happened. No sordid details will be left uncovered.
This course is a complete package that helped me to learn Python Programming from basic to an intermediate level. The course curriculum has been divided into 4 weeks where we can practice questions & attempt the assessment tests according to your own pace. The course offers us a wealth of programming challenges that will help you to prepare for interviews with top-notch companies like Microsoft, Amazon, Adobe etc
Unicode - Hacking The International Character SystemWebsecurify
In this presentation we explore some of the problems of unicode and how they can be used for nefarious purposes in order to exploit a range of critical vulnerabilities including SQL Injection, XSS and many other.
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6Andrei Zmievski
n the halcyon days of early 2005, a project was launched to bring long overdue native Unicode and internationalization support to PHP. It was deemed so far reaching and important that PHP needed to have a version bump. After more than 4 years of development, the project (and PHP 6 for now) was shelved. This talk will introduce Unicode and i18n concepts, explain why Web needs Unicode, why PHP needs Unicode, how we tried to solve it (with examples), and what eventually happened. No sordid details will be left uncovered.
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...gagravarr
If you have one or two files, you can take the time to manually work out what they are, what they contain, and how to get the useful bits out (probably....). However, this approach really doesn't scale, mechanical turks or no! Luckily, there are Apache projects out there which can help!
In this talk, we'll first look at how we can work out what a given blob of 1s and 0s actually is, be it textual or binary. We'll then see how to extract common metadata from it, along with text, embedded resources, images, and maybe even the kitchen sink! We'll see how to do all of this with Apache Tika, and how to dive down to the underlying libraries (including its Apache friends like POI and PDFBox) for specialist cases. Finally, we'll look a little bit about how to roll this all out on a Big Data or Large-Search case.
SAMCSCMLA SCACLSALS CS L LSLSL SAMCSCMLA SCACLSALS CS L LSLSL SAMCSCMLA SCACLSALS CS L LSLSL SAMCSCMLA SCACLSALS CS L LSLSL SAMCSCMLA SCACLSALS CS L LSLSL SAMCSCMLA SCACLSALS CS L LSLSL SAMCSCMLA SCACLSALS CS L LSLSL SAMCSCMLA SCACLSALS CS L LSLSL SAMCSCMLA SCACLSALS CS L LSLSL SAMCSCMLA SCACLSALS CS L LSLSL
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
2. Introduction
• I am Pritam Barhate. I am cofounder and CTO of Mobisoft Infotech, which is a
iPhone & iPad Application Development Company.
• Mobisoft also has expertise in creating scalable cloud based API backends
for iOS and Android applications.
• For more information about various services provided by our company,
please check our website.
3. Definitions
• Character
A character is a sign or a symbol in a writing system. In computing a
character can be, a letter, a digit, a punctuation or mathematical symbol or a
control character (these are not visible, for example, carriage return).
• Script
A script is a collection of letters and other written signs used to represent
textual information in one or more writing systems. For example, Latin is a
script which supports multiple languages like English, French and German.
4. The Need for Character Sets
• Computers only understand binary data. To represents the characters as required
by human languages, the concept of character sets was introduced.
• In character sets each character in a human language is represented by a number.
• In early computing English was the only language used. To represent, the
characters used in English, ASCII character set was used.
• ASCII used 7 bits to represent 128 characters which included: numbers 0 to 9,
lowercase letters a to z, uppercase letters A to Z, basic punctuation symbols,
control codes that originated with Teletype machines, and a space.
• For example, in ASCII, character O is represented by decimal number 79. Same
can be written in binary format as: 1001111 or in hexadecimal (hex) format as: 4F.
5. Evolution!
• Naturally ASCII was insufficient to represent characters in all human
languages. This led to creation to multiple character sets as computing
evolved and became more widespread throughout the world. However, as
computing evolved, multiple companies created different computing platforms
which had support for different character sets.
• Various attempts were made by different platform providers to extend the
currently supported character sets while maintaining backward compatibility.
This created a lot of confusion when data created by program using X
encoding was to be used in a program which uses Y encoding as its default.
• Finally Unicode emerged as a common standard to support majority of written
scripts by people throughout the world.
6. Some Commonly Used Character Sets
• ASCII
• Unicode
• Windows 1252
• ISO 8859
8. UTF-8, UTF-16 and UTF-32
• Since there are thousands of characters in Unicode, just one byte is not
sufficient to represent every character.
• Hence multiple bytes are required to represent Unicode characters.
• Though 32 bits or 4 bytes are sufficient to represent every character in
Unicode, most of the data is in English and Western European languages. So
using uniform 32 bits everywhere is waste of a lot of space.
• Hence, UTF-8 and UTF-16 are more prevalent. These encodings use
variable number of bytes to represent each character as needed.
• UTF-8 is the most prevalent encoding used on web.
• All modern Windows OSes and Java SE 5.0 onwards use UTF-16.
9. How do Encodings Affect Programming?
• When data is transferred between various computing systems, you need to
carefully check in which encoding the data is coming to your system and in
which encoding you are providing the data to other systems.
• An error in this generally results part or all of data being interpreted as
garbage.
10. Tips to Handle Encoding Properly in your Programs - I
• When saving your web pages, which are read by browsers, make sure that
you are using the correct encoding while saving your file. Most of the text
editors have the encoding setting buried deep somewhere in
preferences/options. Take out the time to learn how your favorite text editor /
IDE handles file encodings.
• In static HTML files, as well as dynamically generated HTML always use
proper charset attribute value for meta tag. Make sure that you are indeed
outputting the data in the character set mentioned by you. For example:
<head><meta charset="UTF-8"></head>
11. Tips to Handle Encoding Properly in your Programs - II
• While outputting data for dynamic web pages and APIs make sure that right
Content-Type header is being sent to the client with the response.
For example: Content-Type: text/html; charset=utf-8
• While creating the database in MySQL or PostgreSQL, make sure that you
specify the correct character encoding for your tables.
• For new programs, which use MySQL, when you create the database for
‘Default Collation’ setting choose: utf8 - utf8_unicode_ci.
• For new systems which use PostgreSQL, when you create the database
choose UTF-8 as encoding and en_US.UTF-8 as collation.
12. Tips to Handle Encoding Properly in your Programs - III
• While importing data from text based files, such as CSV, make sure that you
detect the encoding of the file and then read the data using that encoding.
Then while saving the data in your system, make sure that you have
converted to the character set that your system uses. If you are using a library
to do the CSV reading and writing for you, take out the time to understand
how the library handles character encodings. Choose a library only after
making sure that it has a robust support for various character encodings.
• Take out time to understand how the programming language / libraries, you
are using, handle character encoding while parsing files and parsing API
output such as JSON / XML.
13. Useful PHP Functions to Handle Character Encodings
• Most the PHP functions to read files (for example, file_get_contents)
return data as string.
• Unfortunately in PHP strings are just byte arrays. They don’t store any
encoding information. You can use following functions to detect the encoding
of a string.
mb_detect_encoding — Detect character encoding
mb_convert_encoding — Convert character encoding
• However, the implementations of these functions have been criticized online.
So use them with caution.
14. How to Handle Character Encoding in Java
• In Java, you can use InputStreamReader.getEncoding()to detect encoding.
• The juniversalchardet library which is a Java port of universalchardet library
from Mozilla can also be used to detect character encodings in Java.
• Once you have a byte array of data and you have detected the character
encoding for it, you can use the following String constructor to create a string
with proper encoding: public String(byte[] bytes, Charset charset)
• To save a file with desired character encoding, you need to pass the proper
character set to the OutputStreamWriter class. Here is an example of how to
do it.
15. How to Handle Character Encoding in Objective-C
• You use the UniversalDetector library to detect character encodings. This
library is a wrapper for uchardet, which is based on C++ implementation of
the universal charset detection library by Mozilla.
• Once you have detected the encoding of a byte buffer, you can pass that
encoding to NSString constructor to create the correct string with data. For
example: -(instancetype)initWithData:(NSData *)data
encoding:(NSStringEncoding)encoding
• To write a string to a file with desired encoding: - (BOOL)writeToFile:(NSString
*)path atomically:(BOOL)useAuxiliaryFile encoding:(NSStringEncoding)enc
error:(NSError * _Nullable *)error
16. How to Handle Character Encoding in C#
• To detect character encodings in C#, you can use the chardetsharp library,
which is a port of Mozilla’s CharDet Character Set Detector.
• Once you have detected the character set of the byte array, you can use
something similar to this example to obtain a correct C# string object from the
data.
• To write a C# string to a file with a desired encoding, you can follow the
examples provided here.