What one needs to know to work in Natural Language Processing field and the aspects of developing an NLP project using the example of a system to identify text language
What one needs to know to work in Natural Language Processing field and the aspects of developing an NLP project using the example of a system to identify text language
Getting started on your natural language processing project? First you'll need to extract some features from your corpus. Frequency, Syntax parsing, word vectors are good ones to start with.
Monthly AI Tech Talks in Toronto 2019-08-28
https://www.meetup.com/aittg-toronto
The talk will cover the end-to-end details including contextual and linguistic feature extraction, vectorization, n-grams, topic modeling, named entity resolution which are based on concepts from mathematics, information retrieval and natural language processing. We will also be diving into more advanced feature engineering strategies such as word2vec, GloVe and fastText that leverage deep learning models.
In addition, attendees will learn how to combine NLP features with numeric and categorical features and analyze the feature importance from the resulting models.
The following libraries will be used to demonstrate the aforementioned feature engineering techniques: spaCy, Gensim, fasText and Keras in Python.
https://www.meetup.com/aittg-toronto/events/261940480/
Getting started on your natural language processing project? First you'll need to extract some features from your corpus. Frequency, Syntax parsing, word vectors are good ones to start with.
Monthly AI Tech Talks in Toronto 2019-08-28
https://www.meetup.com/aittg-toronto
The talk will cover the end-to-end details including contextual and linguistic feature extraction, vectorization, n-grams, topic modeling, named entity resolution which are based on concepts from mathematics, information retrieval and natural language processing. We will also be diving into more advanced feature engineering strategies such as word2vec, GloVe and fastText that leverage deep learning models.
In addition, attendees will learn how to combine NLP features with numeric and categorical features and analyze the feature importance from the resulting models.
The following libraries will be used to demonstrate the aforementioned feature engineering techniques: spaCy, Gensim, fasText and Keras in Python.
https://www.meetup.com/aittg-toronto/events/261940480/
Oplægget blev holdt ved et seminar i InfinIT-interessegruppen Højniveau sprog til indlejrede systemer den 11. november 2009.
Læs mere om interessegruppen på http://www.infinit.dk/dk/interessegrupper/hoejniveau_sprog_til_indlejrede_systemer/
Evolving as a professional software developerAnton Kirillov
This is second edition of my keynote "On Being a Professional Software Developer" with slide comments (in Russian) which contain main ideas of the keynote.
I hope the slides could be used as a standalone reading material.
Presentation introducing LISP, looking at the history and concepts behind this powerfull programming language.
Presentation by Tijs van der Storm for the sept 2012 Devnology meetup at the Mirabeau offices in Amsterdam
Oplægget blev holdt ved et seminar i InfinIT-interessegruppen Højniveausprog til Indlejrede Systemer den 2. oktober 2013. Læs mere om interessegruppen her: http://infinit.dk/dk/interessegrupper/hoejniveau_sprog_til_indlejrede_systemer/hoejniveau_sprog_til_indlejrede_systemer.htm
One of the main advantages of PHP is that it allows you and your company to build up projects in no time and with immediate feedback and business value. Sometimes, however, fast growth and unprevented complexities could make your codebase more and more difficult to manage as time passes and new features are added.Domain Driven Design can be an elegant solution to the problem, but introducing it in mid-large sized projects is not always easy: you have to deal with difficulties at technical, team and knowledge levels. This talk focuses on how to approach the change in your codebase and in your team mindset without breaking legacy code or stopping the development in favor of neverending refactoring sessions.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
3. Can programming be liberated
from the von Neumann style?
-- John Backus, 1978
https://www.cs.cmu.edu/~crary/819-f09/Backus78.pdf
4. Conventional programming languages are growing ever more
enormous, but not stronger. Inherent defects at the most
basic level cause them to be both fat and weak: their
primitive word-at-a-time style of programming inherited
from their common ancestor--the von Neumann computer, their
close coupling of semantics to state transitions, their
division of programming into a world of expressions and a
world of statements, their inability to effectively use
powerful combining forms for building new programs from
existing ones, and their lack of useful mathematical
properties for reasoning about programs.
An alternative functional style of programming is founded
on the use of combining forms for creating programs.
Functional programs deal with structured data, are often
nonrepetitive and nonrecursive, are hierarchically
constructed, do not name their arguments, and do not
require the complex machinery of procedure declarations to
become generally applicable. Combining forms can use high
level programs to build still higher level ones in a style
not possible in conventional languages.
7. The Value of Programming
Paradigms
• To be taught in universities
8. The Value of Programming
Paradigms
• To be taught in universities
• To ignite flamewars
9. The Value of Programming
Paradigms
• To be taught in universities
• To ignite flamewars
• To characterize programming languages
10. The Value of Programming
Paradigms
• To be taught in universities
• To ignite flamewars
• To characterize programming languages
• To inspire memes
11.
12.
13. Programming is a Pop Culture
Binstock: You once referred to computing as pop culture.
Kay: It is. Complete pop culture. I'm not against pop culture. Developed
music, for instance, needs a pop culture. There's a tendency to over-
develop. Brahms and Dvorak needed gypsy music badly by the end of the 19th
century. The big problem with our culture is that it's being dominated,
because the electronic media we have is so much better suited for
transmitting pop-culture content than it is for high-culture content. I
consider jazz to be a developed part of high culture. Anything that's been
worked on and developed and you [can] go to the next couple levels.
Binstock: One thing about jazz aficionados is that they take deep pleasure
in knowing the history of jazz.
Kay: Yes! Classical music is like that, too. But pop culture holds a disdain
for history. Pop culture is all about identity and feeling like you're
participating. It has nothing to do with cooperation, the past or the future
— it's living in the present. I think the same is true of most people who
write code for money. They have no idea where [their culture came from] —
and the Internet was done so well that most people think of it as a natural
resource like the Pacific Ocean, rather than something that was man-made.
When was the last time a technology with a scale like that was so error-
free? The Web, in comparison, is a joke. The Web was done by amateurs.
http://www.drdobbs.com/architecture-and-design/interview-with-alan-kay/24000
3442
14. The Real Value of
Programming Paradigms
• Taming complexity
• Scaling programming
15. The Real Value of
Programming Paradigms
• Taming complexity
• Scaling programming
There are only two hard problems
in Computer Science:
cache invalidation
and naming things.
-- Phil Karlton
16. How?
• Promoting key principles
• Discouraging some approach
(something should be
“considered harmful”)
17. “Cache Invalidation”
SP: global state considered harmful
OOP: state is incapsulated in objects
FP: state is passed in function calls
“Closures + Hash Tables = As Much OOP
as You’ll Ever Need”
https://lispy.wordpress.com/2007/07/09/closures-hash-tables-as-much-oop-as-youll-ever-n
eed/
18. Levels of Paradigm Coverage
• In-the-small
• In-the-large
• In-between
From the linguistic point-of-view:
• Syntax
• Semantics
• Pragmatics
19. Functional Programming
The good:
• Everything is an expression that
returns a value [in-the-small]
• Referential transparency
[in-the-large]
• Composition [in-between]
The ugly:
• mutable state considered harmful
23. Funcprog Patterns
• Functions all the way down
• Transformation-oriented programming
• Parametrize all the things
• Be as generic as possible
• Partial application
• Continuations
• Monads (composition of two-tracks
functions): use bind to chain
options/tasks/error handlers
• “Railway-oriented” programming
• Use map to lift functions (functors)
• Use monoids for aggregation
(map-reduce)
24. Dealing with Errors
There are only two hard problems
in Computer Science:
cache invalidation
naming things,
and off-by-one errors.
-- Anonymous
25. Type checking
The fundamental observation was that while memory is untyped,
the operations are typed and yield bad results if given
values that were expected to be of a different type than they
actually are, but the computer was utterly unable to report
any of this as a problem because by the time the operations
got their values, the binary representation was assumed to be
that of the type the operations had advertised that they
required. The reason for the compile-time type analysis is
precisely that the execution of the program has no idea
whatsoever which of the many types that map onto to the
binary representation the actual value in
memory is of, and the kind of mistakes that
were made in the past when programmers had
to keep track of this thing by hand was
very expensive.
-- Erik Naggum
https://groups.google.com/forum/#!topic/comp
.lang.lisp/7nhbeh2NIuw%5B126-150%5D
26. Static Typing
Types are declared for both values and
variables. And they are checked at
compile-time.
Applications:
• Check program correctness
• Inform compiler optimizations
• A tool for domain modeling
• Executable documentation
27. Type-oriented
Design Patterns
• Strive for purity
• Use types to represent constraints
• Types are cheap
• Strive for totality
• Use types to indicate errors
• Make illegal states unrepresentative
• Use sum types instead of inheritance
• Use sum types for state machines
• It's ok to expose public data
• Types are executable documentation
29. Types vs Structures/Objects
Terminology:
• Product types ~ Structures
• Sum types ~ Unions, Inheritance
• Pattern matching ~ Destructuring
“Favor object composition over class
inheritance.”
30. Dynamic Typing
Values have types, variables don't.
Type-checking happens at run-time.
Address static typing limitations:
• Additional burden on the programmer
• Not all correct programs typecheck
• Limit program development workflow,
rule out some scenarios (interactive
and exploratory programming)
• Premature optimization
31. A request for more static type checking in Common Lisp is
regarded as a throw-back to the times before we realized
that disjointness is in the eye of the beholder, or as a
missing realization that disjointness does not exist in the
real world and therefore should not exist in the virtual
world we create with our software. Just because computers
are designed a particular way that makes certain types of
values much more efficient to compute with than others, does
not mean that efficiency is /qualitative/. Efficiency is
only quantitative and subordinate to correctness. It is a
very serious error in the Common Lisp world to write a
function that returns the wrong result quickly, but does not
know that it was the wrong result. For this reason, type
correctness is considered to be the responsibility of the
function that makes the requirements, not of the caller or
the compiler. If the programmer who makes those
requirements is sufficiently communicative, however, the
compiler should come to his assistance. The default
behavior, on the other hand, is that functions have to
accept values of type T.
-- Erik Naggum
32. Gradual/Optional Typing
The Common Lisp declaration facility
(declare (ftype (function (integer list) t) ith)
(ftype (function (number) float) sine))
33. Gradual/Optional Typing
The Common Lisp declaration facility
(declare (ftype (function (integer list) t) ith)
(ftype (function (number) float) sin))
You can also declare other properties:
• (declare (dynamic-extent item))
• (declare (ignore dummy))
• (declare (call-in traverse))
• (declare (declaration call-in))
34. Example
> (defun s+ (s1 s2)
(declare (ftype (function (string string) string) s1))
(format nil "~A~A" s1 s2))
> (defun s++ (s1 s2 s3)
(declare (type integer s3))
(s+ s1 (s+ s2 s3)))
; in: DEFUN S++
; (S+ S2 S3)
;
; caught WARNING:
; Derived type of S3 is
; (VALUES INTEGER &OPTIONAL),
; conflicting with its asserted type
; STRING.
36. Cons of Gradual Typing
“Why we’re no longer using Core.typed”
• Slower iteration time
• Core.typed does not implement the
entire Clojure language
• Third-party code is not covered
http://blog.circleci.com/why-were-no-longer-using-core-typed/
37. Fractal Programming
The Domain Layer
This is where all the actual domain rules are defined. In
general that means one or more domain specific languages.
This part of the system is what needs to be malleable enough
that it should be possible to change rules in production,
allow domain experts to do things with it, or just plain a
very complicated configuration.
The Dynamic & Stable Layers
The stable layer is the core set of axioms, the hard kernel
or the thin foundation that you can build the rest of your
system in. There is definitely advantages to having this
layer be written in an expressive language, but performance
and static type checking is most interesting here. There is
always a trade-of in giving up static typing, and the point
of having this layer is to make that trade-of smaller. The
dynamic layer runs on top of the stable layer, utilizing
resources and services provided. This is where all
interfaces are defined. But the implementations for them
lives in the dynamic layer, not in the stable. By doing it
this way you can take advantage of static type information
for your API’s while still retaining full flexibility in
implementation of them. It should be fairly small compared
to the rest of the application, and just provide the base
necessary services needed for everything to function.
-- Ola Bini
https://olabini
.com/blog/2008/
06/fractal-prog
ramming/
38. Static typing is a powerful tool
to help programmers express their
assumptions about the problem
they are trying to solve and
allows them to write more concise
and correct code. Dealing with
uncertain assumptions, dynamism
and (unexepected) change is
becoming increasingly important
in a loosely couple distributed
world. Instead of hammering on
the differences between
dynamically and statically typed
languages, we should instead
strive for a peaceful integration
of static and dynamic aspect in
the same language. Static typing
where possible, dynamic typing
when needed!
-- Erik Meijer
https://www.ics.uci.edu
/~lopes/teaching/inf212
W12/readings/rdl04meije
r.pdf
39. Summary
• Discussed programming paradigms
• Defined functional programming
• Established that
functional programming ≠ static typing
• Discussed static and dynamic typing
• Described gradual/optional typing