This document summarizes Marc Joffe's presentation on extracting and analyzing data from municipal financial disclosures. It discusses gathering pension data from over 1,400 PDF reports published by CalPERS on city pension plans in California. It describes downloading the PDFs, extracting text data using Python scripts, and loading the extracted data into spreadsheets. It also discusses combining the pension data with revenue data from the State Controller to calculate ratios of pension costs to total revenue for each city.
This handout accompanies a presentation, "Data-Driven Enterprise off Any Beat," by Manuel Torres, enterprise editor at The Times-Picayune | Nola.com. It details what data journalism can do for a journalist, how to get started with data journalism, how to find data and how to learn more about data journalism. It also lists links to many data sets by beat. NewsTrain is a training initiative of Associated Press Media Editors: http://bit.ly/NewsTrain
These are slides from the first webinar in the Accidental Gov Info Librarian series. Presented by Bryna Coonin, it covers the basics of government information.
This handout accompanies a presentation, "Data-Driven Enterprise off Any Beat," by Manuel Torres, enterprise editor at The Times-Picayune | Nola.com. It details what data journalism can do for a journalist, how to get started with data journalism, how to find data and how to learn more about data journalism. It also lists links to many data sets by beat. NewsTrain is a training initiative of Associated Press Media Editors: http://bit.ly/NewsTrain
These are slides from the first webinar in the Accidental Gov Info Librarian series. Presented by Bryna Coonin, it covers the basics of government information.
Babcock Ranch is a groundbreaking development. At about 91,000 acres, it was the largest development concept in Florida, preservation purchase by the State of Florida, and the largest photovoltaic solar plant proposal in the world at the time. Proposed as a ‘low carbon eco-city’ (~17,000 acres) and a public-private partnership to preserve ~74,000 acres, that stretches across two counties (Lee and Charlotte), it is part of a 'grand bargain.' The result settled a lawsuit by the Sierra Club, another lawsuit by Lee County, economic development aspirations of Charlotte County and the property owner (Morgan Stanley), and concerns that go well beyond Florida's borders. Over ten years in the planning, construction has now begun.
For more classes visit
www.snaptutorial.com
Review the American Society for Public Administration (ASPA) Code of Ethics, available on the ASPA website under the "Resources" tab, refer to link below.
Code_of_Ethics/Code_of_Ethics1.aspx?hkey=7d5473b7-b98a-48a4-b409-3efb4ceaa006
Recommended policy, Texas Alcoholic Beverage Code:
Presentation to the 2012 National Association of Government Webmasters conference in Kansas City, MO on best practices for using city and county web sites to share government financial data with citizens.
Webinar: How Penton Uses MongoDB As an Analytics Platform within their Drupal...MongoDB
NorthPoint Digital worked with the Penton and MongoDB teams to deliver a MongoDB based solution, Govalytics, to serve city and county governments. We will review the design decisions made and steps taken to implement and integrate into the existing digital platform.
In the session, we will review:
How Govalytics fits into Penton's entire digital platform?
What were the business drivers for choosing MongoDB (with Product Owner testimony) and why it was so successful?
How NorthPoint Digital implemented a complete, highly interactive UX solution powered by MongoDB as part of an integrated solution and not just as a database
Roadmap for the future – how the solution was designed to be independently scalable
This assignment covers chapter 8 and is due by 1000 p.m on Monday.docxchristalgrieg
This assignment covers chapter 8 and is due by 10:00 p.m on Monday, 4/4.
To answer these questions, use the 2014 CAFR for the City of Los Angeles found at www.lacity.org and the 2014 annual report for the Los Angeles City Employees' Retirement System (LACERS). The web address for the pension fund (LACERS) can be copied from page 155 of the CAFR for the city.
This assignment includes 10 questions. plus one extra credit question.
Instructions
Answer questions 1-7 and the extra credit using the CAFR for the City of Los Angeles. Answer questions 8-10 using the report for the LACERS pension fund.
QUESTION 1
1. The city has which of the following fund types (check all that apply)?
Investment Trust Fund
Pension Trust
Tax Agency
Agency funds that are not a tax agency.
1 points
QUESTION 2
1. Read the plan descriptions of the 3 pension plans operated by the City and match the pension plan with where the pension expense for those employees would be recorded.
Pensions
LACERS
DWP plan
A.
Governmental activities primarily
B.
Both governmental activities and business-type activities.
C.
Business activies
1 points
QUESTION 3
1. Which description below best describes the Pension and Other Postemployment Funds for the city of Los Angeles?
The assets equal the liabilities in this type of fund.
The Pension Funds have significantly less assets than they do liabilities, which is what I would expect.
The Pension Funds have significantly less assets than they do liabilities, which is NOT what I would expect.
The Pension Funds have significantly more assets than they do liabilities, which is what I would expect.
The Pension Funds have significantly more assets than they do liabilities, which is NOT what I would expect.
1 points
QUESTION 4
1. The value of investments held by the Pension Funds:
Remained the same value
Increased in value.
Changes in value cannot be determined by the financial statements.
Decreased in value.
1 points
QUESTION 5
1. Who contributed more to the pension plans in the current year?
the city.
the employees.
Cannot be determined from the CAFR.
the city and employees contributed equally.
1 points
QUESTION 6
1. The enterprise funds of the city have:
a pension liability that indicates the city pension plans are over-funded.
a pension liability that indicates the city pension plans are under funded.
no pension liability reported in the enterprise funds.
a pension liability that indicates the total amount owed to enterprise fund employees for pensions.
1 points
QUESTION 7
1. In addition to accumulating resources to pay pensions, the city is also accumulating resources to pay for health benefits upon retirement.
True
False
1 points
QUESTION 8
1. During the last 3 years, the LACERS pension plan has become:
more overfunded
less underfunded
less overfunded
more underfunded
1 points
QUESTION 9
1. Which of the following had the greatest ...
Babcock Ranch is a groundbreaking development. At about 91,000 acres, it was the largest development concept in Florida, preservation purchase by the State of Florida, and the largest photovoltaic solar plant proposal in the world at the time. Proposed as a ‘low carbon eco-city’ (~17,000 acres) and a public-private partnership to preserve ~74,000 acres, that stretches across two counties (Lee and Charlotte), it is part of a 'grand bargain.' The result settled a lawsuit by the Sierra Club, another lawsuit by Lee County, economic development aspirations of Charlotte County and the property owner (Morgan Stanley), and concerns that go well beyond Florida's borders. Over ten years in the planning, construction has now begun.
For more classes visit
www.snaptutorial.com
Review the American Society for Public Administration (ASPA) Code of Ethics, available on the ASPA website under the "Resources" tab, refer to link below.
Code_of_Ethics/Code_of_Ethics1.aspx?hkey=7d5473b7-b98a-48a4-b409-3efb4ceaa006
Recommended policy, Texas Alcoholic Beverage Code:
Presentation to the 2012 National Association of Government Webmasters conference in Kansas City, MO on best practices for using city and county web sites to share government financial data with citizens.
Webinar: How Penton Uses MongoDB As an Analytics Platform within their Drupal...MongoDB
NorthPoint Digital worked with the Penton and MongoDB teams to deliver a MongoDB based solution, Govalytics, to serve city and county governments. We will review the design decisions made and steps taken to implement and integrate into the existing digital platform.
In the session, we will review:
How Govalytics fits into Penton's entire digital platform?
What were the business drivers for choosing MongoDB (with Product Owner testimony) and why it was so successful?
How NorthPoint Digital implemented a complete, highly interactive UX solution powered by MongoDB as part of an integrated solution and not just as a database
Roadmap for the future – how the solution was designed to be independently scalable
This assignment covers chapter 8 and is due by 1000 p.m on Monday.docxchristalgrieg
This assignment covers chapter 8 and is due by 10:00 p.m on Monday, 4/4.
To answer these questions, use the 2014 CAFR for the City of Los Angeles found at www.lacity.org and the 2014 annual report for the Los Angeles City Employees' Retirement System (LACERS). The web address for the pension fund (LACERS) can be copied from page 155 of the CAFR for the city.
This assignment includes 10 questions. plus one extra credit question.
Instructions
Answer questions 1-7 and the extra credit using the CAFR for the City of Los Angeles. Answer questions 8-10 using the report for the LACERS pension fund.
QUESTION 1
1. The city has which of the following fund types (check all that apply)?
Investment Trust Fund
Pension Trust
Tax Agency
Agency funds that are not a tax agency.
1 points
QUESTION 2
1. Read the plan descriptions of the 3 pension plans operated by the City and match the pension plan with where the pension expense for those employees would be recorded.
Pensions
LACERS
DWP plan
A.
Governmental activities primarily
B.
Both governmental activities and business-type activities.
C.
Business activies
1 points
QUESTION 3
1. Which description below best describes the Pension and Other Postemployment Funds for the city of Los Angeles?
The assets equal the liabilities in this type of fund.
The Pension Funds have significantly less assets than they do liabilities, which is what I would expect.
The Pension Funds have significantly less assets than they do liabilities, which is NOT what I would expect.
The Pension Funds have significantly more assets than they do liabilities, which is what I would expect.
The Pension Funds have significantly more assets than they do liabilities, which is NOT what I would expect.
1 points
QUESTION 4
1. The value of investments held by the Pension Funds:
Remained the same value
Increased in value.
Changes in value cannot be determined by the financial statements.
Decreased in value.
1 points
QUESTION 5
1. Who contributed more to the pension plans in the current year?
the city.
the employees.
Cannot be determined from the CAFR.
the city and employees contributed equally.
1 points
QUESTION 6
1. The enterprise funds of the city have:
a pension liability that indicates the city pension plans are over-funded.
a pension liability that indicates the city pension plans are under funded.
no pension liability reported in the enterprise funds.
a pension liability that indicates the total amount owed to enterprise fund employees for pensions.
1 points
QUESTION 7
1. In addition to accumulating resources to pay pensions, the city is also accumulating resources to pay for health benefits upon retirement.
True
False
1 points
QUESTION 8
1. During the last 3 years, the LACERS pension plan has become:
more overfunded
less underfunded
less overfunded
more underfunded
1 points
QUESTION 9
1. Which of the following had the greatest ...
Deanna’s Input for Question 3As Chief Financial Management Of.docxtheodorelove43763
Deanna’s Input for Question 3:
As Chief Financial Management Officer of Riverside County, water resources are a top priority to ensure public needs are adequately being met for all county communities. The sources of drinking water (both tap water and bottled water) include rivers, lakes, streams, ponds, springs, and wells. It is extremely important to eliminate as many contaminants in drinking water for the public health. As such high demands in the county for clean drinking water, there is a need to create a new water management policy, which includes the development of a new drinking water treatment plant to respond to this critical need. The proposed drinking water treatment plant could produce close to 3 million gallons of drinking water per day diminishing the water crises. In addition, the county could potentially sell water to neighboring counties and the agricultural sector to help increase local revenue to the county. The policy requires an initial outlay of $20M and subsequent annual outlays of $5M for the foreseeable future.
How would I approach this task?
The first step would be to convene an interdepartmental capital allocation committee to examine the proposed policy in combining existing capital improvement projects and the overall county master plan for land use. If committee members agree to the feasibility of moving forward the next step would be to update the existing capital improvement plan (CIP), which spans multiple years to ensure adequate resources are available for the proposed water management policy and new facility. Edits to the existing CIP would include the follow:
1. Capital budget manual – contains a calendar or flowchart of the process, instructions, and forms for departments to use when completing requests
2. Cost projections – determining exact costs of each project
3. Revenue estimations – detailed estimate and availability of revenue, both reoccurring and from bond sales
4. Debt planning – outlining debt needs; scheduling voter referendum to authorize debt funding; obtaining voter approval on bond sales
5. Public hearing – schedule public hearing, prior to capital budget approval
6. Prepare final executive budget request
Information, I would need to know:
· Goals, timeliness and identification of various funding sources
· Financial analysis to include: 1) Cost-Benefit analysis – cost v. overall net benefit;
· Financial Condition Analysis
I. Existing long-term debt commitments/obligations
II. Population Growth Trends (e.g., housing, business)
· History of existing and recent user and property taxes – provides insight into existing taxes currently being levied on the community; property sales and tax info would be instrumental in helping to determine trends in sales and ability to generate revenue through levies (impose, “a tax, fee, or fine) and regional commerce activity.
· Fiscal S.
For providing context for breaking news or developing enterprise stories off your beat, databases are your friend. Learn how to develop a data state of mind, find newsworthy data and begin to analyze data sets. Spot the enterprise stories in the numbers, whether your beat is breaking news, sports, health, business, education, local government or cops and courts. It is accompanied by two handouts: Bringing a data mindset to your reporting by Houston, and Pivot tables on Google Sheets by Dylan Tiger. Brant Houston is the Knight Chair in Investigative Reporting at the University of Illinois, where he oversees an online newsroom, CU-CitizenAccess.org. For more info on the News Leaders Association's NewsTrain, see https://www.newsleaders.org/newstrain.
Forging a federal government open data agenda by liv watsonWorkiva
The federal government possesses an enormous amount of valuable public data, which should be used
to improve government services and promote private sector innovation. This legislation seeks to
achieve these goals by creating an expectation that – by default – government data will be open and
available whenever possible. Specifically, this bill defines open data without locking in yesterday’s
technology; creates standards for making federal government data available to the public; requires the
federal government to use open data to improve decision making; and ensures accountability by
requiring oversight during key periods of implementation.
As the importance of having a data strategy in place is sinking in, many organizations have added a chief data officer (CDO) to their executive team to help create and implement that strategy. But every organization is doing this a little bit differently. This talk will describe how a variety of industries and organizations are using CDOs and will make recommendations for best practices.
I’ll present the new knowledge discovery tools we are building at Diffeo. Unlike traditional search engines that use keywords, Diffeo provides an in-browser knowledge base that accelerates information gathering about people, companies, chemical compounds, cyber events, or other real world entities. I’ll describe how Diffeo uses active learning to encourage long and deep user interactions in order to recommend new content for in-progress articles. As you write, the search results get better and more interesting, because the system can see more precisely which entity you mean and which you don’t (disambiguation) and also what you don’t know yet about the entity (discovery).
Finally in this presentation I’ll describe our experience organizing the Text REtrieval Conference (TREC) on Knowledge Base Acceleration (KBA) and Dynamic Domain (DD) which are pushing the state of the art in knowledge discovery on large streams. I’ll show you how to access the largest corpus of streaming text data ever released for public evaluations.
An exposé on human-centered design, as related to data science and “medium data”. Examples of great API design will be showcased, as well as other end-user facing tools that can enable data scientists to share their observations with the world.
Mobile technology Usage by Humanitarian Programs: A Metadata Analysisodsc
CommCare, developed by Dimagi Inc., is an open-source mobile technology platform that supports hundreds of humanitarian frontline programs worldwide. The objective of this analysis is to demonstrate how CommCare metadata contains a wealth of information that can inform humanitarian programs in their use of mobile technology. This understanding can help programs determine the most effective way to implement CommCare or other mobile technology in resource-poor settings. A typical CommCare user is a frontline worker, such as a community health worker who provides outreach to pregnant women and children. An important feature of CommCare is that it supports case management, allowing users to register, update, and close cases in their CommCare application. A case is usually a user’s client, e.g., a pregnant woman who is supported by the CommCare user. While using CommCare, the user fills out electronic forms which eventually get submitted to the CommCare cloud server. The cumulative number of forms submitted by CommCare users as of December 2014 was just over 10 million. Metadata for each form submitted through CommCare are stored in Dimagi’s data platform; included in a form’s metadata are date and time stamps for when each form was started and ended by the user and when the form was eventually received by the cloud server.
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hiveodsc
The main objective of this workshop is to give the audience hands on experience with several Hadoop technologies and jump start their hadoop journey. In this workshop, you will load data and submit queries using Hadoop! Before jumping in to the technology, the Founders of DataKitchen review Hadoop and some of its technologies (MapReduce, Hive, Pig, Impala and Spark), look at performance, and present a rubric for choosing which technology to use when.
We’ve all been told to “work smarter, not harder.” But what does working smarter really mean? In the world of finance and trading, working smarter means working differently. None of us can compete against computers stacked inches away from the stock exchange or blue chip companies with multi-million dollar marketing campaigns. The key to winning is to go where the big guys haven’t and the way to do that is through diverse datasets. In this talk, you will discover the theory and tools to discover new datasets from unexpected sources in order to gain an upper-hand in both finance and business. So whether you’re a quant that trades in his bedroom or a restaurateur looking to grow his business, you’ll learn how the diversity of data can be the sharpest knife if your set.
Data Science at Dow Jones: Monetizing Data, News and Informationodsc
In this presentation I will describe the way Data Science supports the business of information and news at Dow Jones. Specifically, I will describe how we are introducing innovative and advanced large-scale information mining and analytic approaches not only into Dow Jones’ products but also into our strategy and decision making processes.Our goal is to impact every aspect of Dow Jones: from the way journalism is produced in the newsroom, to the way we create and deliver institutional products, to the way we improve retention and acquisition of subscribers. While the task seems broad and daunting, we have already achieved various successes through the application of machine learning, data mining, advanced analytics and big data approaches.In this presentation I will describe how we have achieved this, including our tools, data, approaches and mechanisms as well as describe what our plans are going forward.
Have you been in the situation where you’re about to start a new project and ask yourself, what’s the right tool for the job here? I’ve been in that situation many times and thought it might be useful to share with you a recent project we did and why we selected Spark, Python, and Parquet. My plan is take you through a use case that involves loading, transforming, aggregating, and persisting the dataset. We’ll use an open dataset consisting of full fund holdings graciously provided by Morningstar. My goal in presenting this use case are to have the audience learn about how these technologies can be applied to a real world problem and to inspire members of the audience to start learning these technologies and applying them to their own projects.
Building a Predictive Analytics Solution with Azure MLodsc
Create and operationalize a predictive model using Microsoft Azure Machine Learning.
– Perform the typical steps involved in building a predictive analytics solution such as data ingestion, data cleansing, data exploration, feature engineering, model selection and evaluation of model results
–learn how to use machine learning with big data scenarios using tools like Hadoop and SQL Server to process and work with such data.
Finding and classifying the mentions of the things named in text, often called Named Entity Recognition or NER, is a fundamental task in many search and analysis applications. Mature, robust NER technology is available for many languages and domains, from people, places, and products, to diseases, genes, and molecules. However, for emerging tasks like knowledge-base construction, mentions alone are insufficient.
In this presentation we’ll explore techniques that go beyond names to:
link mentions to one another and to rich knowledge sources like Wikidata
discover and characterise the relationships between entities that are explicit in the text
And we’ll discuss some of the most important practical implications of these advancements for open data science.
According to Credit Suisse’s Gender 3000 report, at the end of 2013, women accounted for 12.9% of top management in 3000 companies across 40 countries. However, since 2009, companies with women as 25-50% of their management team
returned 22-29%. If companies with women in management outperform so dramatically, what would happen if you invested in women-led companies? Karen Rubin will explore this question and share her findings after running a 12 year investment simulation.
Data science allows us to turn a dark forest into a world of
perpetual twilight by giving us the tools to better understand the data that surrounds us. Unfortunately, in this world of twilight we still need a flashlight to get a clean crisp image of our immediate surroundings. We will talk about how to use deep domain expertise as that flashlight shedding light on our understanding of data. Our focus will be on using text analysis as a means to examine qualitative information in a structured, quantitative way. We will draw heavily from examples in complex central bank policy and financial regulation.
Open Source Tools & Data Science Competitions odsc
This talk shares the presenter’s experience with open source tools in data science competitions. In the past several years Kaggle and other competitions have created a large online community of data scientists. In addition to competing with each other for fame and glory, members of this community also generously share knowledge, insights using forum and open source code. The open competition and sharing have resulted in rapid progress in the sophistication of the entire community. This presentation will briefly cover this journey from a competitor’s perspective, and share hands on tips on some open source tools proven popular and useful in recent competitions.
scikit-learn has emerged as one of the most popular open source machine learning toolkits, now widely used in academia and industry.
scikit-learn provides easy-to-use interfaces to perform advanced analysis and build powerful predictive models.
The tutorial will cover basic concepts of machine learning, such as supervised and unsupervised learning, cross validation, and model selection. We will see how to prepare data for machine learning, and go from applying a single algorithm to building a machine learning pipeline.
We will also cover how to build machine learning models on text data, and how to handle very large datasets.
Bridging the Gap Between Data and Insight using Open-Source Toolsodsc
Despite the proliferation of open-source tools for analysis (such as Python and R) and those used for visualization
(such as Javascript / D3), there often exist significant gaps between these areas, and those of us trying to navigate the complete arc from data to insight can encounter many obstacles along the way. Fortunately, in recent years there have been many efforts to fill these needs, and today distilling a meaningful visualization from raw data is faster and easier than ever before.
In this talk we will use will use examples in geospatial analysis and visualization to illustrate how to open-source tools like Python, geopandas, and TileMill work together. Using examples from the RunKeeper mobile app we will show how we currently use these tools to understand better our customers and their data, and to communicate
with our colleagues, external partners, and the data community at large.
Human-generated text may be the next frontier for big data analysis, but we humans are complicated beasts and the text we generate is messy and complicated in ways that can confound analysis. We’ll describe the top ten mistakes people make when they start doing text analysis, and hopefully save you from making a few of these mistakes yourself.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
📕 Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
The Art of the Pitch: WordPress Relationships and Sales
Keeping Governments Accountable with Open Data Science: Extracting and Analyzing Municipal Financial Data
1. EXTRACTING & ANALYZING
DATA FROM MUNICIPAL
FINANCIAL DISCLOSURES
Marc Joffe
O P E N
D A T A
S C I E N C E
C O N F E R E N C E_
BOSTON 2015
@opendatasci
2. Extracting and Analyzing Data
from Municipal Financial
Disclosures
Marc Joffe
Public Sector Credit Solutions
Open Data Science Conference
Boston, May 2015
3. The Research Question
• How is the cost of funding public employee pensions affecting
California cities?
• I hoped to answer the question by gathering pension expenditure
data for all cities in the state.
• Main data points:
• Current and future contribution amounts
• Funded ratio
4. Data on City Pensions
• The best sources for information on local government pension costs
are (1) the municipality’s audited financial statements (CAFRs) and (2)
actuarial valuation reports published by the pension fund.
• In California (and some other states), most cities rely on a multi-
employer pension system. The system in California, CalPERS,
publishes one actuarial report for each local government pension
plan it administers – about 3000 in all.
• I was just interested in the roughly 1400 plans covering city
employees. CalPERS publishes a unique PDF for each plan.
• The main challenge is thus to get the 1400 PDFs and extract key data
points (such as future actuarially required contributions) from them.
5. Gathering the Pension Data (1 of 2)
• Found a web page that had links to all the actuarial valuation PDFs.
• In this case: http://www.calpers.ca.gov/index.jsp?bc=/about/forms-
pubs/calpers-reports/actuarial-reports/home.xml
• Downloaded this page and scraped all the links
• This can be done with a python script (ideally leveraging an HTML processing
library like BeautifulSoup) or by copying/pasting to Excel. When copying
content from a web page to Excel, it is better to use Internet Explorer than
other browsers.
• Ran a command line script to download all the links. This shell script
or windows command file can use curl or wget to retrieve the PDFs.
6. Gathering the Pension Data (2 of 2)
• Because the valuation PDFs have embedded text, no OCR was
necessary. I pulled out the text with Poppler’s pdftotext command
line executable, using the –layout option to make the outputs more
readable.
• Because the PDFs had very consistent formats (they appear to have
been output by a report generator), I could take advantage of
patterns in the text. I wrote Python scripts to read each file and
extract just the portions I needed. I output the strings I captured to a
CSV file.
• I loaded the CSV file into Excel for further analysis.
7. Answering the “So What Question” with Revenue Data
• The raw pension numbers are not that interesting unless placed into
some context. I wanted to calculate the ratio of pension costs to total
revenue for each city because that is a fiscal health measure. A
ranking of cities by this measure is interesting – especially to cities
near the top of the ranking!
• The actuarial valuation reports provide actuarially required
contributions for the upcoming fiscal year. I could get revenue data
from CAFRs but these are published on a delayed basis.
• A more timely source proved to be a data set provided by the State
Controller via a Socrata Open Data platform. See
http://bythenumbers.sco.ca.gov.
8. Mashing up the Data and Analyzing
• I now had two data sets: pension costs and revenues.
• The remaining steps needed to calculate the pension cost/revenue ratios
are as follows:
• Add up all the plans for each city to get total city pension costs.
• Map the city names in the CalPERS data set to the city names in the State Controller
data set. This was generally straightforward, but there were a couple of oddities
(such as Paso Robles = El Paso de Robles)
• Using the common key (i.e., standardized city name), combine the two data sets
• Calculate the ratio
• Sort in descending order
• I did the above in Excel and Google Sheets. I could have used Python or
another scripting language but I find spreadsheets easier.
10. Our next project: govwiki.us
URL: http://govwiki.us
Repo: https://github.com/govwiki/govwiki.us
Online database of all US local governments.
• Obtained a list of 91,000 local governments from
the US census
• Performed rough geocoding
• Now gathering additional data from public
sources in California
• Hope to launch in August
• Also hope to create a Wikipedia interface
• Environment: MySQL, Node.js, Coffeescript
11. Original PDF Liberation Presentation – 1/2014
• In January 2014, I worked with the Sunlight Foundation to host the
“PDF Liberation Hackathon” in New York, Washington, Chicago and
San Francisco.
• A list of PDF extraction solutions and sample PDF extraction problems
available at: http://pdfliberation.wordpress.com/
• Following are some slides related to that event
12. An Example of How PDF Liberation Can
Generate News
• Working with Mortgage Resolution Partners, the City of Richmond has
proposed to use its power of eminent domain to refinance mortgages
for underwater homeowners
• In July, the media reported that 624 properties had been chosen
• I wanted to know which ones, so I filed a California Public Records Act
request . . .
13. The Request…(Make it Very Specific)
Dear Ms. Holmes,
Pursuant to my rights under the California Public Records Act (Government Code Section 6250 et seq.), I ask to obtain a copy of the following, which I understand to be held by your agency:
Attachments A, B and C to letters sent to mortgage servicers offering to purchase mortgage loans dated on or about July 31, 2013. The form letter is available on the internet at
http://www.contracostatimes.com/west-county-times/ci_23760190/document-city-richmond-letter-mortgage-lenders?source=pkg. I understand that 32 such letters have been sent, so this request
involves as many as 96 unique documents.
The purpose of this request is to obtain a list of 624 mortgages which Richmond is offering to purchase containing the property addresses, mortgage amounts, appraised values, servicer names, and, if
possible, the name of the Residential Mortgage Backed Securities (RMBS) deal holding each mortgage. If you can provide this listing in a more concise format, I will accept it in lieu of the attachments
described in the previous paragraph.
I ask for a determination on this request within 10 days of your receipt of it, and an even prompter reply if you can make that determination without having to review the record[s] in question.
If you determine that some but not all of the information is exempt from disclosure and that you intend to withhold it, I ask that you redact it for the time being and make the rest available as
requested.
In any event, please provide a signed notification citing the legal authorities on which you rely if you determine that any or all of the information is exempt and will not be disclosed.
If I can provide any clarification that will help expedite your attention to my request, please contact me by phone at 415-578-0558 or by email at marc@publicsectorcredit.org. I ask that the requested
documents be sent to be in electronic format via return email. If you must provide paper documents, I ask that you notify me of any duplication costs exceeding $50 before you duplicate the records so
that I may decide which records I want copied. I can visit your office to collect the documents once they have been duplicated.
Thank you for your time and attention to this matter.
Sincerely,
Marc D. Joffe
1655 North California Blvd. Unit 162
Walnut Creek, CA 94596
15. Processing
• Loaded the four PDFs into Able2Extract – a commercial PDF conversion tool that
costs about $100*
• Converted the PDFs to Microsoft Excel
• I had now had multiple lists of properties with different fields
• I sorted the lists into the same order and then joined them together into one
master spreadsheet
• I found that three properties had mortgage balances over $800,000 and was able
to connect the balances to the addresses
• This made it possible to map the properties and to see the houses themselves on
Google Street View
* Tabula, an open source tool, is reaching the point at which it could perform the same function.
16. The Results …
• Lead story in the business section of the Chronicle
• Wall Street Journal blog post
• Finding raised at City Council meeting
• In December, Mayor Gayle McLaughlin altered the program to
exclude mortgages above the conforming loan limit ($729,500)
and to focus on blighted neighborhoods.
By the way:
The owner of the house on the right was apparently unaware
that her home had been included in the program. So my initial
theory that this had been a case of cronyism was not borne out.
17. Some of Our Challenges
• Government Financial Statements
• IRS Form 990s (Non-Profit Disclosures)
• House of Representative Financial Disclosures
• Compiling a History of Torture
20. . . . And
finding the 1%
in Congress by
dissecting
House
Financial
Disclosures
This project was taken on by our second place prize winner. Their best results came from using Captricty.com.
21. Documenting a History of Torture: Parsing
Amnesty International Annual Reports
This project was taken on by our first place prize winner.
22. Three Inter-Related Problems …
• Extracting data from PDFs that contain embedded text
• Using Optical Character Recognition (OCR) to generate text from PDFs
of scans or photographs
• Transforming unstructured text and numbers into a form that can be
readily analyzed. A related IT term is ETL (Extract-Transform-Load)
23. … and some Open Source Solutions
• Extracting data from PDFs that contain embedded text
PDFBox, Poppler
• Using Optical Character Recognition (OCR) to generate text from PDFs
of scans or photographs
Tesseract
• Transforming unstructured text and numbers into a form that can be
readily analyzed. A related IT term is ETL (Extract-Transform-Load)
Tabula (for table identification), OpenRefine
24. … or Licensed Solutions
• Extracting data from PDFs that contain embedded text
PDFLib Text Extraction Tool
• Using Optical Character Recognition (OCR) to generate text from PDFs
of scans or photographs
ABBYY (FineReader or Cloud SDK)
• Transforming unstructured text and numbers into a form that can be
readily analyzed. A related IT term is ETL (Extract-Transform-Load)
SIMX Text Converter