Computational journalism projects

•Download as PPTX, PDF•

0 likes•244 views

Presentation to Duke University computer science students, February 2012, by Sarah Cohen, Knight Professor of the Practice

Technology Education

Reporterslab.org

Presentation for computational
journalism students
February 2012

STRUCTURED DATA
.. And most reporters’ inability to deal with it

New York Times reporters used Word searches and
annotations to analyze Wikileaks documents in 2010
and 2011.

PANDA project trying to help gather data inside newsrooms

Barriers to Structured data analysis in
the newsroom
• Expensive
• Too hard to collect.
• It takes practice
• It takes patience.
• Once collected, data has a short shelf life – its
value inside the newsroom effectively ends
once a story is published.

Web-scraping software:
ephemeral or too
expensive for a task not
viewed as mission-
critical.

Solutions
• User-friendly tool for scraping websites for
structured data
• Packages of algorithms from fraud and other
forensic fields for use with public records
datasets online.
• Packages of queries and statistical tests for
money, dates, geographical identifiers, names
and codes, presented in standard English
• Tools for fuzzy matching of datasets: include
scoring, best match likelihood, interactive
machine learning for different datasets.

TOO MUCH MATERIAL
With too little information

Too many sources with too little news

• Twitter, Facebook, LinkedIn and other social media
• RSS feeds from other news organizations and blogs
• Press releases from government agencies or beat
subjects

Lack of archiving is just as troubling as the lack of
structure. Reporters can’t hold the powerful
accountable without information from the past.

Solutions
• Archiving users’ feeds locally or in the cloud
• Mash-up social media, rss feeds into an app
that reveals more insight into the sources
• Formalize each reporter’s definition of “news”
through machine learning.
• Alerts for important source material. Example:
changing time of a press conference.

Solutions
• Visual extractor of data from scanned forms.
• Separate scanned boxes of documents into
their pieces for further analysis
• Use speech recognition tools on government
audio and video
• OCR video to find the speaker at a hearing

For unstructured data

ANTIQUATED METHODS

Our way A newer way

• Hand-enter individual items • Leverage web scraping and
into spreadsheets paid crowdsourcing for data
• Transcribe entry (MT)
interviews, hearings and • Use speech recognition for
other audio and video the first pass on searchable
content for searching audio and video
• Read each document • Use clustering, information
extraction and other
methods for overview of
documents

Reporterslab.org working to tame
audio and video

Associated Press
project to bring order
to unstructured data

REPORTERSLAB.ORG
Creating sample data and documents for researchers based on real
stories

Presentation at the Digital Humanities 2018 Conference, Mexico City, on the development of the Media Suite, an online research environment that facilitates scholarly research using large multimedia collections maintained at archives, libraries and knowledge institutions. The Media Suite unlocks the data on the collection level, item level, and segment level, provides tools that are aligned with the scholarly primitives (discovery, annotation, comparison, linking), and has a 'workspace' for storing personal mixed media collections and annotations, and to do advanced analysis using Jupyter Notebooks and NLP tools. See the notes for the narrative that goes with the slides.

The role of virtual research environments (VRE's) within the context of an e-...

heila1

Audiovisual collections, the spoken word and user needs of scholars in the Hu...

roelandordelman.nl

Mid-Sweden University/SNIA Conference 13 October 2008

Mark Conrad

The document discusses the Electronic Records Archives (ERA) program at the US National Archives and Records Administration (NARA). It outlines NARA's challenges in preserving and providing access to massive amounts of electronic records from the US government. The ERA program conducts research to develop next-generation technologies and methods for issues like managing heterogeneous data at large scales, integrating information from different sources, and ensuring records can be accessed over time as technologies change. Examples of ERA's research partnerships with organizations like NIST, the Army Research Laboratory, and the National Science Foundation are provided.

Di d dlf_handout

cwilliford

This document discusses four key lessons learned from the Digging Into Data Challenge, which funded computational research projects in the humanities and social sciences. Lesson 1 is that these projects require open sharing of resources like hardware, software, data, and communication tools. Information professionals can facilitate partnerships and resource sharing agreements. Lesson 2 is that these projects rely on diverse expertise in domains, analytics, data management, and project management that information professionals can provide. Lesson 3 is that computational tools need to be adapted to evolving research questions through close and distant readings. Lesson 4 is that humanities and social science research now deals with "big data" and produces large datasets that information professionals can help curate and preserve long-

Media Suite: Unlocking Archives for Mixed Media Scholarly Research

roelandordelman.nl

This document summarizes the CLARIAH Media Suite, which aims to unlock institutional multimedia collections for mixed media scholarly research. It provides access to over 1.8 million radio/TV items, 60 million newspaper pages, and other collections. The Media Suite connects these collections through a shared virtual workspace that allows scholars to discover, access, annotate, analyze, and export data. It supports scholarly primitives like distant and close reading. However, the document notes some issues like complex interfaces, lack of robust analysis pipelines, and need for manual metadata work. Overall, the Media Suite brings analytical tools to archived data and enables new forms of mixed media research, though further development is still needed.

One Discovery Layer, Eight Front Doors: Implementing Blacklight @ IU

Courtney McDonald

The document summarizes Indiana University's implementation of the Blacklight discovery layer across its eight campuses to provide a shared interface for its online catalog (IUCAT) while allowing for flexibility across campuses. Key points include: IU has a complex data environment with diverse collections across eight campuses previously only served by a one-size-fits-all interface; in 2011 IU selected Blacklight over VuFind as its discovery layer due to flexibility and development community; implementation began in summer 2011 with a public beta in fall 2012 and full transition in May 2013; campus-specific views and call number browsing were customized; and future work includes enhanced customization, transition to Kuali OLE, and improving browse functions.

This document provides an introduction to data science, including defining data science, discussing the different types of data (structured, unstructured, natural language, machine-generated, graph-based, audio/video/images, and streaming) and tools used (Python, R, SQL, Hadoop, Spark). It also discusses benefits and uses of data science across industries and gives examples to illustrate each type of data.

Incentivising the uptake of reusable metadata in the survey production process

Louise Corti

This document discusses incentivizing the uptake of reusable metadata in survey production. It notes that there is no universal language used to document survey questions and variables, leading to wasted resources. The Data Documentation Initiative (DDI) is proposed as a standard. Barriers to adopting metadata best practices include legacy systems, manual processes, and reluctance to change. The document outlines ideas to incentivize metadata use such as specifying documentation requirements in funding calls and improving documentation tools and workflows. Showing tangible benefits through applications like question banks and data exploration systems is also suggested.

Semanticnews 230913-final

David Newman

Guy avoiding-dat apocalypse

ENUG

The document provides an overview of research data management and the importance of avoiding a "DATApocalypse" or data disaster. It discusses the definition of research data, why data management is important, questions to consider, best practices for data management planning, documentation, and long-term preservation. The goal is to help researchers and institutions properly manage data to enable sharing and preservation, as required by most major funders.

2016 Ocean Sciences Meeting tutorial

Josh Young

FAIRDOM data management support for ERACoBioTech Proposals

FAIRDOM

This document provides information about a webinar from the FAIRDOM Consortium on data management for ERACoBioTech full proposals. It includes: - Details on how to budget for and include a data management plan in proposals - A checklist for developing a data management plan covering topics like the types and volumes of data, data sharing and reuse, and making data FAIR - An overview of the FAIRDOM services and software platform that can help with project data management and stewardship

Digitization in theory and practice

Helen Nneka Okpala

This document provides an overview of the digitization process. It defines digitization as converting analog materials like text, photos, and voice recordings into digital formats. The document outlines the key steps in a digitization workflow, including identifying materials to digitize, addressing copyright and selection criteria, scanning and manipulating digital files, and making materials web accessible. The goal of digitization is to increase access and preserve collections for current and future use.

Open minted content_provision

Lucas anastasiou

This document discusses challenges with text and data mining (TDM) projects, including spending 90% of time collecting and preprocessing data due to the magnitude and heterogeneity of data. It analyzes access to scientific literature, finding transactional and some analytical access but no programmatic/raw data access from major sources. APIs provide some but not full access and data dumps are difficult to store, analyze and share. True unrestricted access is needed for TDM tasks like text summarization. Legal barriers and skills gaps also impede TDM. The document proposes an openMinTed framework as a solution with interoperable data and algorithms in a legally safe and trusted environment.

R programming language - Mustafa Wahedi

UNICORNS IN TECH

Research Data (and Software) Management at Imperial: (Everything you need to ...

Sarah Anna Stewart

Advanced Research Investigations for SIU Investigators

Sloan Carne

co:op-READ-Convention Marburg - Günter Mühlberger

ICARUS - International Centre for Archival Research

Günter Mühlberger (University of Innsbruck, AT): The READ project. Objectives, tasks and partner organisations co:op-READ-Convention Marburg Technology meets Scholarship, or how Handwritten Text Recognition will Revolutionize Access to Archival Collections. With a special focus on biographical data in archives Hessian State Archives Marburg Friedrichsplatz 15, D - 35037 Marburg 19-21 January 2016

MPhil Lecture on Data Vis for Analysis

Shawn Day

This document provides an introduction to data visualization for analysis. It discusses exploring datasets that can include textual, numerical, and other data. The document outlines the data visualization process and mentions some common tools and methods used. It also discusses extending your toolset and provides an example exercise exploring a dataset and creating a visualization to gain insights. The objective is to appreciate the variety of techniques available to digital humanities scholars for data analysis and visualization.

Change Management for Libraries

Thomas King

This document discusses change management for libraries in the digital age. It notes that digital technologies are blurring traditional lines between types of resources, institutions, and access to information. Users now expect online access and searching across all information formats and locations. The management of digital information requires investment in people, technology, and resources. Libraries must develop new skills and roles to integrate physical and digital collections and provide one-stop searching. Repositories are important for managing and preserving the growing amount of digital research output and data. Metadata standards help link resources across repositories at multiple levels from institutional to international.

ERA CoBioTech Data Management Webinar

FAIRDOM

The webinar discussed FAIRDOM services that can help applicants to the ERACoBioTech call with their data management plans and requirements. FAIRDOM offers webinars on developing data management plans, and their platform and tools can help with organizing, storing, sharing, and publishing research data and models in a FAIR manner by utilizing metadata standards. Different levels of support are available, from general community resources through their hub, to premium customized support for individual projects. Consortia can include FAIRDOM as a subcontractor within the guidelines of the ERACoBioTech call.

Crowdsourcing or bust: The Indexer, Archives NZ

donellemckinley

E research africa presentation (19 nov 2014)

Isak Van der Walt

This document summarizes a presentation about research data management (RDM) at the University of Pretoria. It discusses the university's efforts to implement RDM, including conducting surveys of current practices, developing policies, and piloting RDM projects. Two key pilot projects involved using the Alfresco content management system to manage research data from the Institute for Cellular and Molecular Medicine and a neurophysiology group. The presentation outlines the university's process for implementing these pilots and next steps around dissemination, preservation, and addressing ongoing hurdles in developing comprehensive RDM.

Going Full Circle: Research Data Management @ University of Pretoria

Johann van Wyk

Presentation delivered at the eResearch Africa Conference, held 23-27 November 2014, at the University of Cape Town, Cape Town, South Africa. Various approaches to Research Data Management at Higher Education Institutions focus on an aspect or two of the research data cycle. At the University of Pretoria the approach has been to support researchers throughout the research process covering the whole research data cycle. The idea is to facilitate/capture the research data throughout the research cycle. This will give context to the data and will add provenance to the data. The University of Pretoria uses the UK Data Archive’s research data cycle model, to align its Research Data Management project-development. This model identifies the stages of a research data cycle as: creating data, processing data, analysing data, preserving data, giving access to data, and reusing data. This paper will give a short overview of the chronological development of research data management at the University of Pretoria. The overview will also highlight findings of two surveys done at the University, one in 2009 and one in 2013. This will be followed by a discussion of a number of pilot projects at the University, and how the needs of researchers involved in these projects are being addressed in a number of the stages of the research data cycle. The discussion will also give a short overview of how the University plans to support those stages not currently being addressed. The second part of the presentation will focus on the projects and technology (software and hardware) used. The University of Pretoria has adopted an Enterprise Content Management (ECM) approach to manage its Research Data. ECM is not a singular platform or system but rather a set of strategies, tools and methodologies that interoperate with each other to create a comprehensive management tool. These sets create an all-encompassing process addressing document, web, records and digital asset management. At the University of Pretoria we address all these processes with different software suites and tools to create a complete management system. Each process presented its own technical challenges. These had to be addressed, while keeping in mind the end objective of supporting researchers throughout the whole research process and data life cycle. Various platforms and standards have been adopted to meet the University of Pretoria’s criteria. To date three processes have been addressed namely, the capturing of data during the research process, the dissemination of data and the preservation of data.

Data Management for Undergraduate Researchers

Rebekah Cummings

This document summarizes a seminar on data management for undergraduate researchers. It discusses what data is, why it needs to be managed, and key aspects of the data management process such as data organization, metadata, storage, and archiving. Topics covered include file naming best practices, version control, documentation, metadata standards, storage options, and long-term archiving. The goal is to help researchers organize and document their data so it can be understood, preserved, and reused.

Digital Tools, Trends and Methodologies in the Humanities and Social Sciences

Shawn Day

This document provides an overview of digital tools, trends and methodologies for the social sciences and humanities. It discusses defining digital humanities and gives examples of digital projects and resources. A case study is presented on exploring the lives of 19th century Ontario farmers through digitizing and analyzing journal entries. The document encourages thinking about how digital approaches can inform research and lists upcoming seminars on digital topics.

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency

ScyllaDB

Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...

saastr

Similar to Computational journalism projects

Introduction to Data Science.pptx

Anusuya123

Incentivising the uptake of reusable metadata in the survey production process

Louise Corti

Semanticnews 230913-final

David Newman

Guy avoiding-dat apocalypse

ENUG

2016 Ocean Sciences Meeting tutorial

Josh Young

FAIRDOM data management support for ERACoBioTech Proposals

FAIRDOM

Digitization in theory and practice

Helen Nneka Okpala

Open minted content_provision

Lucas anastasiou

R programming language - Mustafa Wahedi

UNICORNS IN TECH

Research Data (and Software) Management at Imperial: (Everything you need to ...

Sarah Anna Stewart

Advanced Research Investigations for SIU Investigators

Sloan Carne

co:op-READ-Convention Marburg - Günter Mühlberger

ICARUS - International Centre for Archival Research

MPhil Lecture on Data Vis for Analysis

Shawn Day

Change Management for Libraries

Thomas King

ERA CoBioTech Data Management Webinar

FAIRDOM

Crowdsourcing or bust: The Indexer, Archives NZ

donellemckinley

E research africa presentation (19 nov 2014)

Isak Van der Walt

Going Full Circle: Research Data Management @ University of Pretoria

Johann van Wyk

Data Management for Undergraduate Researchers

Rebekah Cummings

Digital Tools, Trends and Methodologies in the Humanities and Social Sciences

Shawn Day

Similar to Computational journalism projects (20)

Introduction to Data Science.pptx

Incentivising the uptake of reusable metadata in the survey production process

Semanticnews 230913-final

Guy avoiding-dat apocalypse

2016 Ocean Sciences Meeting tutorial

FAIRDOM data management support for ERACoBioTech Proposals

Digitization in theory and practice

Open minted content_provision

R programming language - Mustafa Wahedi

Research Data (and Software) Management at Imperial: (Everything you need to ...

Advanced Research Investigations for SIU Investigators

co:op-READ-Convention Marburg - Günter Mühlberger

MPhil Lecture on Data Vis for Analysis

Change Management for Libraries

ERA CoBioTech Data Management Webinar

Crowdsourcing or bust: The Indexer, Archives NZ

E research africa presentation (19 nov 2014)

Going Full Circle: Research Data Management @ University of Pretoria

Data Management for Undergraduate Researchers

Digital Tools, Trends and Methodologies in the Humanities and Social Sciences

Recently uploaded

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency

ScyllaDB

Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...

saastr

June Patch Tuesday

Ivanti

Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.

Columbus Data & Analytics Wednesdays - June 2024

Jason Packer

Y-Combinator seed pitch deck template PP

c5vrf27qcz

Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe

Precisely

Inconsistent user experience and siloed data, high costs, and changing customer expectations – Citizens Bank was experiencing these challenges while it was attempting to deliver a superior digital banking experience for its clients. Its core banking applications run on the mainframe and Citizens was using legacy utilities to get the critical mainframe data to feed customer-facing channels, like call centers, web, and mobile. Ultimately, this led to higher operating costs (MIPS), delayed response times, and longer time to market. Ever-changing customer expectations demand more modern digital experiences, and the bank needed to find a solution that could provide real-time data to its customer channels with low latency and operating costs. Join this session to learn how Citizens is leveraging Precisely to replicate mainframe data to its customer channels and deliver on their “modern digital bank” experiences.

Essentials of Automations: Exploring Attributes & Automation Parameters

Safe Software

Building automations in FME Flow can save time, money, and help businesses scale by eliminating data silos and providing data to stakeholders in real-time. One essential component to orchestrating complex automations is the use of attributes & automation parameters (both formerly known as “keys”). In fact, it’s unlikely you’ll ever build an Automation without using these components, but what exactly are they? Attributes & automation parameters enable the automation author to pass data values from one automation component to the next. During this webinar, our FME Flow Specialists will cover leveraging the three types of these output attributes & parameters in FME Flow: Event, Custom, and Automation. As a bonus, they’ll also be making use of the Split-Merge Block functionality. You’ll leave this webinar with a better understanding of how to maximize the potential of automations by making use of attributes & automation parameters, with the ultimate goal of setting your enterprise integration workflows up on autopilot.

GNSS spoofing via SDR (Criptored Talks 2024)

Javier Junquera

In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security. This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing. The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU

panagenda

Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/ DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen! Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell. Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten. Diese Themen werden behandelt - Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten - Wie funktionieren CCB- und CCX-Lizenzen wirklich? - Verstehen des DLAU-Tools und wie man es am besten nutzt - Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw. - Praxisbeispiele und Best Practices zum sofortigen Umsetzen

Leveraging the Graph for Clinical Trials and Standards

Neo4j

9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...

saastr

Northern Engraving | Nameplate Manufacturing Process - 2024

Northern Engraving

Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!

Driving Business Innovation: Latest Generative AI Advancements & Success Story

Safe Software

Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency. During the hour, we’ll take you through: Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board. Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes. Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI. We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI. This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!

Mutation Testing for Task-Oriented Chatbots

Pablo Gómez Abajo

Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots. To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.

Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...

Pitangent Analytics & Technology Solutions Pvt. Ltd

Programming Foundation Models with DSPy - Meetup Slides

Zilliz

Monitoring and Managing Anomaly Detection on OpenShift.pdf

Tosin Akinosho

Monitoring and Managing Anomaly Detection on OpenShift Overview Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices. Key Topics Covered 1. Introduction to Anomaly Detection - Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems. 2. Understanding Edge (IoT) - Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source. 3. What is ArgoCD? - Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices. 4. Deployment Using ArgoCD for Edge Devices - Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD. 5. Introduction to Apache Kafka and S3 - Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions. 6. Viewing Kafka Messages in the Data Lake - Learn how to view and analyze Kafka messages stored in a data lake for better insights. 7. What is Prometheus? - Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices. 8. Monitoring Application Metrics with Prometheus - Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system. 9. What is Camel K? - Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes. 10. Configuring Camel K Integrations for Data Pipelines - Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow. 11. What is a Jupyter Notebook? - Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text. 12. Jupyter Notebooks with Code Examples - Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.

Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...

saastr

Nordic Marketo Engage User Group_June 13_ 2024.pptx

MichaelKnudsen27

“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/how-axelera-ai-uses-digital-compute-in-memory-to-deliver-fast-and-energy-efficient-computer-vision-a-presentation-from-axelera-ai/ Bram Verhoef, Head of Machine Learning at Axelera AI, presents the “How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-efficient Computer Vision” tutorial at the May 2024 Embedded Vision Summit. As artificial intelligence inference transitions from cloud environments to edge locations, computer vision applications achieve heightened responsiveness, reliability and privacy. This migration, however, introduces the challenge of operating within the stringent confines of resource constraints typical at the edge, including small form factors, low energy budgets and diminished memory and computational capacities. Axelera AI addresses these challenges through an innovative approach of performing digital computations within memory itself. This technique facilitates the realization of high-performance, energy-efficient and cost-effective computer vision capabilities at the thin and thick edge, extending the frontier of what is achievable with current technologies. In this presentation, Verhoef unveils his company’s pioneering chip technology and demonstrates its capacity to deliver exceptional frames-per-second performance across a range of standard computer vision networks typical of applications in security, surveillance and the industrial sector. This shows that advanced computer vision can be accessible and efficient, even at the very edge of our technological ecosystem.

Recently uploaded (20)

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency

Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...

June Patch Tuesday

Columbus Data & Analytics Wednesdays - June 2024

Y-Combinator seed pitch deck template PP

Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe

Essentials of Automations: Exploring Attributes & Automation Parameters

GNSS spoofing via SDR (Criptored Talks 2024)

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU

Leveraging the Graph for Clinical Trials and Standards

9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...

Northern Engraving | Nameplate Manufacturing Process - 2024

Driving Business Innovation: Latest Generative AI Advancements & Success Story

Mutation Testing for Task-Oriented Chatbots

Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...

Programming Foundation Models with DSPy - Meetup Slides

Monitoring and Managing Anomaly Detection on OpenShift.pdf

Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...

Nordic Marketo Engage User Group_June 13_ 2024.pptx

“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...

Computational journalism projects

1. Reporterslab.org Presentation for computational journalism students February 2012

2. STRUCTURED DATA .. And most reporters’ inability to deal with it

3. New York Times reporters used Word searches and annotations to analyze Wikileaks documents in 2010 and 2011.

4. PANDA project trying to help gather data inside newsrooms

5. Barriers to Structured data analysis in the newsroom • Expensive • Too hard to collect. • It takes practice • It takes patience. • Once collected, data has a short shelf life – its value inside the newsroom effectively ends once a story is published.

6. Web-scraping software: ephemeral or too expensive for a task not viewed as mission- critical.

7. Solutions • User-friendly tool for scraping websites for structured data • Packages of algorithms from fraud and other forensic fields for use with public records datasets online. • Packages of queries and statistical tests for money, dates, geographical identifiers, names and codes, presented in standard English • Tools for fuzzy matching of datasets: include scoring, best match likelihood, interactive machine learning for different datasets.

8. TOO MUCH MATERIAL With too little information

9. Too many sources with too little news • Twitter, Facebook, LinkedIn and other social media • RSS feeds from other news organizations and blogs • Press releases from government agencies or beat subjects Lack of archiving is just as troubling as the lack of structure. Reporters can’t hold the powerful accountable without information from the past.

10. Solutions • Archiving users’ feeds locally or in the cloud • Mash-up social media, rss feeds into an app that reveals more insight into the sources • Formalize each reporter’s definition of “news” through machine learning. • Alerts for important source material. Example: changing time of a press conference.

11. The buried treasure UNUSABLE RECORDS

12.

13. Solutions • Visual extractor of data from scanned forms. • Separate scanned boxes of documents into their pieces for further analysis • Use speech recognition tools on government audio and video • OCR video to find the speaker at a hearing

14.

15. For unstructured data ANTIQUATED METHODS

16. Our way A newer way • Hand-enter individual items • Leverage web scraping and into spreadsheets paid crowdsourcing for data • Transcribe entry (MT) interviews, hearings and • Use speech recognition for other audio and video the first pass on searchable content for searching audio and video • Read each document • Use clustering, information extraction and other methods for overview of documents

17. Reporterslab.org working to tame audio and video

18. Associated Press project to bring order to unstructured data

19. Wordseer for historical text

20. Jigsaw

21.

22. REPORTERSLAB.ORG Creating sample data and documents for researchers based on real stories

Computational journalism projects

Recommended

Recommended

More Related Content

Similar to Computational journalism projects

Similar to Computational journalism projects (20)

Recently uploaded

Recently uploaded (20)

Computational journalism projects