The document contains notes from a statistics course on various topics including floating point math, optimization, continuing education, and professional development. It discusses issues that can arise from floating point numbers, provides tips for optimizing code, and recommends ways to further learning through reading books and papers, communicating skills, and managing email.
After a short theoretical introduction into the Extreme Programming (XP) and Scrum, the two major flavours of agile development, we will work on an example web project using Extreme Programming. The workshop will cover the whole development cycle - from planning through setting up a continuous integration server with test framework, up to developing and shipping a web application with PHP. We will add new features incrementally in a test-driven way, covering the application with unit and acceptance tests, keeping it integrated and fully functional all the time. While working, we will exercise all main practices of XP, starting with Pair Programming, Simple Design, Test-Driven Development, Refactoring and finishing with Continuous Integration and Small Releases.
A brief overview of currently popular & available key/value, column oriented & document oriented databases, along with implementation suggestions for the CakePHP web application framework.
Intro Agile Software Development with Scrum for Campus Party 2009Antonio Silveira
Slides with an introduction on Scrum Agile process. Addressing the basics:
Origins and Principles; The Roles of the Product Owner, Scrum Master and the Team; The Meetings (Daily Stand up, Sprint Planning, Sprint Review and Retrospective) and the artifacts (Sprint Backlog, Product Backlog and the charts)
The fourth rendition of my Beautiful Web Typography with some updates, additional info, more links and whatnot.
Kudos should go out to the chaps listed in the end as well as inspirational peeps like Ellen Lupton, whose categorisation of things type into letter, text, grid I’ve used to structure this talk.
After a short theoretical introduction into the Extreme Programming (XP) and Scrum, the two major flavours of agile development, we will work on an example web project using Extreme Programming. The workshop will cover the whole development cycle - from planning through setting up a continuous integration server with test framework, up to developing and shipping a web application with PHP. We will add new features incrementally in a test-driven way, covering the application with unit and acceptance tests, keeping it integrated and fully functional all the time. While working, we will exercise all main practices of XP, starting with Pair Programming, Simple Design, Test-Driven Development, Refactoring and finishing with Continuous Integration and Small Releases.
A brief overview of currently popular & available key/value, column oriented & document oriented databases, along with implementation suggestions for the CakePHP web application framework.
Intro Agile Software Development with Scrum for Campus Party 2009Antonio Silveira
Slides with an introduction on Scrum Agile process. Addressing the basics:
Origins and Principles; The Roles of the Product Owner, Scrum Master and the Team; The Meetings (Daily Stand up, Sprint Planning, Sprint Review and Retrospective) and the artifacts (Sprint Backlog, Product Backlog and the charts)
The fourth rendition of my Beautiful Web Typography with some updates, additional info, more links and whatnot.
Kudos should go out to the chaps listed in the end as well as inspirational peeps like Ellen Lupton, whose categorisation of things type into letter, text, grid I’ve used to structure this talk.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Essentials of Automations: The Art of Triggers and Actions in FME
26 Development
1. Stat405 Development
Hadley Wickham
Monday, 30 November 2009
2. 1. Floating point math
2. Optimisation
3. Continuing
education
4. Feedback
Monday, 30 November 2009
3. Your turn
Perform the following calculations in R.
Are the answers what you expect?
seq(0.1, 0.9, by = 0.1) - 1:9 / 10
sqrt(2)^2 - 2
What is the property of these numbers
that might cause the problem?
Monday, 30 November 2009
4. # Each number must be stored in a finite amount of space
# => each number can only have a finite number of digits
# => floating point math does not work like normal math
(1e-16 + 1) == 1
(1e-16 + 1) * 10 == 1e-16 * 10 + 1 * 10
1e9 + 2 - 0.1 - 1e9
1e10 + 2 - 0.1 - 1e10
1e11 + 2 - 0.1 - 1e11
1e12 + 2 - 0.1 - 1e12
1e13 + 2 - 0.1 - 1e13
1e14 + 2 - 0.1 - 1e14
Monday, 30 November 2009
5. # By default R only shows 7 significant digits
# If the trailing digits are zero, the number will be rounded
(1 / 237)
(1 / 237) * 237
(1 / 237) * 237 - 1
seq(0.1, 0.9, by = 0.1)
seq(0.1, 0.9, by = 0.1) - 1:9 / 10
# Tricky to get to print exactly:
formatC((1 / 237) * 237, digits = 20)
formatC(seq(0.1, 0.9, by = 0.1), digits = 20)
Monday, 30 November 2009
6. # When working with floating point numbers (numeric)
# (but not integers, this is the one place where the
# difference is important) never test for equality with ==
a <- seq(0.1, 0.9, by = 0.1)
b <- 1:9 / 10
all(a == b)
all.equal(a, b)
all(abs(a - b) < 1e-6)
# Similarly, need to be careful with < and > etc
Monday, 30 November 2009
7. # Places where this matters:
#
# * sums
# * calculating the standard deviation
# * inverting a matrix (condition)
# * linear models!
# * maximum likelihood estimation
Monday, 30 November 2009
8. Optimisation
If, and only if, your code is too slow
First use system.time() to figure out
exactly how long things are taking: you
need this so you can check your changes
actually speed things up
Then see what is taking the longest
amount of time with the profr package
Monday, 30 November 2009
9. General advice
• Start with the slowest part of your code
• Use built-in R functions, where possible
• Use vectorised functions, where
possible
• Think through your basic algorithm
• Knowledge of basic CS algorithms
and data structures v. helpful
Monday, 30 November 2009
11. Continuing education
Learn more about R.
Learn more about your other tools.
Professional development
Monday, 30 November 2009
12. Mailing list
Sign up to R-help: https://stat.ethz.ch/
mailman/listinfo/r-help
Make sure to set up filters
Skim interesting subjects and read them
Don’t be afraid to post
(use a pseudonym if necessary)
Monday, 30 November 2009
13. Read books
Phil Spector. Data Manipulation with R.
William N. Venables and Brian D. Ripley.
Modern Applied Statistics with S.
Frank E. Harrell. Regression Modelling
Strategies.
Jose C. Pinheiro and Douglas M. Bates.
Mixed-Effects Models in S and S-Plus.
Monday, 30 November 2009
14. Read papers
The R Journal: http://journal.r-project.org/
The Journal of Statistical Software: http://
www.jstatsoft.org/
Monday, 30 November 2009
15. Learn your tools
• Touch typing
• Text editor
• Command line
• Caffeine
• Email
Monday, 30 November 2009
16. Professional
development
The aspects of being a statistician, apart
from knowing statistics.
Principally communication: written,
spoken, visual and electronic.
Take every opportunity you can to
practice these skills.
Monday, 30 November 2009
17. Visual Electronic
Written
Posters Email
Papers
Graphics Website
Vita/Resume
Blog
Bibliography
Reviews Code
Spoken
Oral exam Video
Teaching
Slidecast
Short talk
Long talk
Monday, 30 November 2009
18. Written
Particularly important if you want to be an
academic, or if you‘re PhD student, or
want to become one.
“Style: Toward Clarity and Grace”
Sign up for the thesis writing workshops
when they come around.
Develop a regular habit.
Monday, 30 November 2009
19. My habit
• Roll out of bed at 7am
• Boil water
• Make tea
• Drink tea
• Write for an hour
Monday, 30 November 2009
20. Spoken
Seize every opportunity to practice.
Make use of Tracy Volz - tmvolz@rice.edu.
She is a fantastic resource - if you had to
pay for her, you wouldn’t be able to afford
it.
Monday, 30 November 2009
23. 1200
1000
800
value
600 unread
read
400
200
0
2007 2008 2009 2010
Monday, 30 November 2009
24. 1.0
0.8
0.6
read/all
0.4
0.2
0.0
2007 2008 2009 2010
from
Monday, 30 November 2009
25. 350
300
250
value
200 direct
sent
150
100
50
2007 2008 2009 2010
Monday, 30 November 2009
26. 350
300
250
value
200 direct
sent
150
100
50
2007 2008 2009 2010
Monday, 30 November 2009
27. Inbox Zero
http://www.43folders.com/izero
Merlin Mann
There is no way you will ever be able to respond to — let alone read in
exquisite detail — every email you ever receive for the rest of your life. If
you take issue with this, just wait six months, because, believe me, we’re
all getting a lot more email (and other sundry demands on our attention)
every day. What seems like a doddle today is going to get progressively
more difficult — even insurmountable — unless you put a realistic system
in place now.
Monday, 30 November 2009
28. Your time is priceless
(and wildly limited)
You need an agnostic system for
dealing with mail that isn’t based on
nonces, exceptions, and guilt.
[The] ultimate goal is for you to spend
less time playing with your email and
more time doing stuff.
Monday, 30 November 2009
29. Key concepts
Regularly empty your inbox
Minimal response
Delete, delete, delete
Filters
Email dashes
Monday, 30 November 2009
30. Response does not need to be
proportional to request
“Do you still need this?”
“I don’t know”
“Good idea. I’ll add it to my to do list.”
“Here’s a link that might be what you’re
looking for…”
[Delete]
http://www.43folders.com/2006/03/13/email-cheats
Monday, 30 November 2009
31. Delete!
Most minimal response is none.
“Just remember that every email you
read, re-read, and re-re-re-re-re-read as it
sits in that big dumb pile is actually
incurring mental debt on your behalf.”
Be brutally honest - if you’re not going to
do anything with the email delete it now.
Monday, 30 November 2009
32. Filters grey mail
“noisy, frequent, and non-urgent items
which can be dealt with all at a pass and
later.”
facebook, comments, university/
department memos, newsletters, mailing
lists
Good catch all: contains unsubscribe
http://www.43folders.com/2006/03/13/filters
Monday, 30 November 2009
34. Patricia Wallace, a techno-psychologist,
believes part of the allure of e-mail—
for adults as well as teens—is similar to
that of a slot machine. “You have
intermittent, variable reinforcement,”
she explains. You are not sure you are
going to get a reward every time or
how often you will, so you keep pulling
that handle.”
Monday, 30 November 2009
35. Email dashes
Don’t have your email open all day.
Schedule times when you respond to
emails.
You can tackle emails a lot faster when
you batch them up.
Lack self control (like me)? Try an internet
blocker: http://macfreedom.com/
http://www.43folders.com/2006/03/15/email-dash
Monday, 30 November 2009
36. Feedback
http://hadley.wufoo.com/
forms/stat405-final-feedback/
Monday, 30 November 2009