Unicode provides a standard way to encode characters from all languages. This introduced the problem of how to represent these characters in memory and storage. Several encoding forms were developed, including UTF-8, UTF-16, and UTF-32, each with their own advantages and disadvantages. UTF-8 became popular as it is backwards compatible with ASCII and uses variable length encoding for efficient storage. Notepad and browsers can determine the encoding of a file or page through a Byte Order Mark or HTTP header respectively. In practice, UTF-8 is commonly recommended due to its efficiency and compatibility.
Introduction of Ethical Hacking, Life cycle of Hacking, Introduction of Penetration testing, Steps in Penetration Testing, Foot printing Module, Scanning Module, Live Demos on Finding Vulnerabilities a) Bypass Authentication b) Sql Injection c) Cross site Scripting d) File upload Vulnerability (Web Server Hacking) Countermeasures of Securing Web applications
Facebook Forensics Toolkit(FFT) is a very simple Forensic Tool to find out people's personal and behavioral information through extracting data from their Facebook profile .
Introduction of Ethical Hacking, Life cycle of Hacking, Introduction of Penetration testing, Steps in Penetration Testing, Foot printing Module, Scanning Module, Live Demos on Finding Vulnerabilities a) Bypass Authentication b) Sql Injection c) Cross site Scripting d) File upload Vulnerability (Web Server Hacking) Countermeasures of Securing Web applications
Facebook Forensics Toolkit(FFT) is a very simple Forensic Tool to find out people's personal and behavioral information through extracting data from their Facebook profile .
Cyber security and demonstration of security toolsVicky Fernandes
Presentation on Cybersecurity and demonstration of security tools, conducted by Vicky Fernandes on 10th September 2019 at Don Bosco Institute of Technology, Mumbai.
This presentation gives a brief description about IP Address (Internet protocol address), Classes of IPv4. And also included, what is IPv4 and what is IPv6.
Ayman Sheta Microsoft certified Trainer
www.asheta.com
This PP will show the step by step installation of Windows Server 2016 TP4 in a virtual machine which I have created for an online presentation .
Introduction to WWW, History of Web
Protocols governing web
Cyber Crime
Cyber Laws
IT Act 2000
Web Development Strategies, Planning and Development
Web Applications
Web Development Process
Web Team
My talks at Voxxed Days Zurich 2016. This is about he history of character encodings and unicode. And it's all about APIs stuck in the 90ies where things were very different.
Character sets and collations are am important part of the database setup. In this presentation I show you the history of character sets and how they are used today, how UTF-8 works and how MySQL handles all this.
Cyber security and demonstration of security toolsVicky Fernandes
Presentation on Cybersecurity and demonstration of security tools, conducted by Vicky Fernandes on 10th September 2019 at Don Bosco Institute of Technology, Mumbai.
This presentation gives a brief description about IP Address (Internet protocol address), Classes of IPv4. And also included, what is IPv4 and what is IPv6.
Ayman Sheta Microsoft certified Trainer
www.asheta.com
This PP will show the step by step installation of Windows Server 2016 TP4 in a virtual machine which I have created for an online presentation .
Introduction to WWW, History of Web
Protocols governing web
Cyber Crime
Cyber Laws
IT Act 2000
Web Development Strategies, Planning and Development
Web Applications
Web Development Process
Web Team
My talks at Voxxed Days Zurich 2016. This is about he history of character encodings and unicode. And it's all about APIs stuck in the 90ies where things were very different.
Character sets and collations are am important part of the database setup. In this presentation I show you the history of character sets and how they are used today, how UTF-8 works and how MySQL handles all this.
Unicode - Hacking The International Character SystemWebsecurify
In this presentation we explore some of the problems of unicode and how they can be used for nefarious purposes in order to exploit a range of critical vulnerabilities including SQL Injection, XSS and many other.
our application is great – and popular. You have translation efforts underway, everything is going well – and wait a minute, what’s the report of strange question mark characters all over the page? Unicode is pain. UTF-32, UTF-16, UTF-8 and then something else is thrown in the mix … Multibyte and codepoints, it all sounds like greek. But it doesn’t have to be so scary. PHP support for Unicode has been improving, even without native unicode string support. Learn the basics of unicode is and how it works, why you would add support for it in your application, how to deal with issues, and the pain points of implementation.
How Unidecoder Transliterates UTF-8 to ASCIISimon Courtois
Slides of my talk at Paris.rb on 2014-11-07. How does UTF-8 work? How to leverage it to convert chinese, russian or any non-ASCII character to ASCII? Here is what the Unidecoder gem does.
This lesson is for students taking the Cambridge School certificate exams Computer science subject(2210).I hope that it will of help to students in this period of crisis. Send me your feedback or suggestions on buxooa72@ gmail.com,
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
5. Let’s start from the beginning
Once upon a time, when life was simpler
(for Americans), we had ASCII
6. ASCII
• Designed for teleprinters in the 1960s
• 7 bit (0000000 to 1111111 in binary)
• 128 codes (0 to 127 in decimal)
0 1 0 0 0 0 0 1
0 1 0 0 0 0 1 0
0 1 0 0 0 0 1 1
A
B
C
(65 in decimal)
(66 in decimal)
(67 in decimal)
7. Why waste this 1 bit?
0 1 0 0 0 0 0 1
0 1 0 0 0 0 1 0
0 1 0 0 0 0 1 1
A
B
C
(65 in decimal)
(66 in decimal)
(67 in decimal)
Meanwhile, 8-bit byte were becoming common
(aka,128 more spaces for characters)
🤔
8. Everyone had the same idea-
Let’s Extend ASCII
And use all those 256 characters
10. Code Pages
• There are too many (more than 220 DOS and Windows
code pages alone)
• IBM, Apple all introduced their own “code pages” 🤛 🤜
• Side-note: ANSI character set has no well-defined
meaning, they are a collection of 8-bit character sets
compatible with ASCII but incompatible with each other
11. Problems
• Most are compatible with ASCII but incompatible with
each other
• Programs need to know what code page to use in order
to display the contents correctly
• Files created on one machine may be unreadable on
another
• Even 256 characters are not enough %
14. Unicode
• Finally everyone agreed on what code point mapped to
what character 🎉
• There’s room for over 1 million code points (characters),
though the majority of common languages fit into the first
65536 code points
17. UTF-32
• Simplest encoding
• Unicode supports 1,114,112 code points. We can store
them using 21 bits
• UTF-32 is 32-bit in length. Fixed length encoding
• So we take a 21-bit value and simply zero-pad the value
out to 32 bits
18. UTF-32
• Example
A in ASCII 01000001
A in UTF-32 00000000 00000000 00000000 01000001
or, 01000001 00000000 00000000 00000000
• So, not compatible with ASCII
• Super wasteful
• But faster text operations (e.g character count)
• Messes where “null terminated strings” (00000000) are
expected
19. UTF-32
• Also, now we have to deal with “Endianness” 🤔
00000000 00000000 00000000 01000001
vs,
01000001 00000000 00000000 00000000
20. Endianness
• Big endian machine: Stores data big-end first. When looking at
multiple bytes, the first byte is the biggest
increasing addresses ———->
00000000 00000000 00000000 01000001
• Little endian machine: Stores data little-end first. When looking at
multiple bytes, the first byte is smallest
increasing addresses ———->
01000001 00000000 00000000 00000000
• Endianness does not matter if you have a single byte, because
how we read a single byte is same in all machines 😌
21. UTF-16
• The oldest encoding for Unicode. Often mislabeled as
"Unicode encoding”
• Variable length encoding. 2 bytes for most common
characters (BMP), 4 bytes for everything else
• The most common characters (BMP) in Unicode fits into first
65,536 code points, so it’s straightforward. Throw away top 5
zeros from 21 bit, you get UTF-16.
A 00000 00000000 01000001 (21 bit) becomes-
A 00000000 01000001 (16 bit)
• Uses “Surrogate pairs” for other characters
22. UTF-16
• Multi-byte encoding, so has Endianness like UTF-32
• Incompatible with ASCII
A in UTF-16 00000000 01000001
A in ASCII 01000001
• Incompatible with old systems that rely on null (null byte:
00000000) terminated strings
• Uses less space than UTF-32 in practice
• Windows API, .NET and Java environments are founded on
UTF-16, often called “wide character string”
23. UTF-8
• Nice & simple. This is the kid that everyone loves *
• Backward compatible with ASCII 🙌
• 0 to 127 code points are stored as regular, single-byte
ASCII.
A in ASCII 01000001
A in UTF-8 01000001
* UTF-8 Everywhere Manifesto: https://utf8everywhere.org/
24. UTF-8
• Code points 128 and above are converted to binary and
stored (encoded) in a series of bytes
A 2-byte example looks like this
110xxxxx 10xxxxxx
In contrast, single byte ASCII characters (<128 decimal
code points) look like 0xxxxxxx
Count byte
Starts with
11..0
Data byte
Starts with
10
25. UTF-8
• A 2-byte example looks like this
110xxxxx 10xxxxxx
(Count Byte) (Data Byte)
• The first count byte indicates the number of bytes for the code-point,
including the count byte. These bytes start with 11..0:
110xxxxx (The leading “11” is indicates 2 bytes in sequence, including the
“count” byte)
1110xxxx (1110 -> 3 bytes in sequence)
11110xxx (11110 -> 4 bytes in sequence)
• After count bytes, data bytes starting with 10… and contain information
for the code point
26. UTF-8
• Example:
অ 'BENGALI LETTER A' (U+0985)
Binary form: 00001001 10000101
0000 100110 000101
UTF-8 form: 11100000 10100110 10000101
• Variable length encoding. Uses more space than UTF-16
for text with mostly Asian characters (2 bytes vs 3 bytes),
less space than UTF-16 for text with mostly ASCII
characters (1 byte vs 2 bytes)
27. UTF-8
• No null bytes. Old programs work just fine that treats null
bytes (00000000) as end of string
• We read and write a single byte at a time, so no worry of
Endianness. This is very convenient
• UTF-8 is the encoding you should use if you work on web
28. How does Notepad know which encoding was used when it opens a file?
29. BOM
(Byte Order Mask)
BOM Encoding
00 00 FE FF UTF-32, big-endian
FF FE 00 00 UTF-32, little-endian
FE FF UTF-16, big-endian
FF FE UTF-16, little-endian
EF BB BF UTF-8
30. How do browsers know which encoding was used when it opens a page?
33. Practical Considerations
• If you are working on web, use UTF-8
• If your operation is mostly with GUI and calling windows APIs with Unicode
string, use UTF-16
• UTF-16 takes least space than UTF-8 and UTF-16 if most characters are
Asian
• UTF-8 takes least space if most characters are Latin
• If memory is cheap and you need fastest operation, random access to
characters etc, use UTF-32
• If dealing with Endianness & BOM is a problem, then use UTF-8
• When in doubt, use UTF-8 😛