I will show how to use Go's database/sql package, with MySQL as an example. Although the documentation is good, it's dense. I'll discuss idiomatic database/sql code, and cover some topics that can save you time and frustration, and perhaps even prevent serious mistakes.
Description
If you want to get data from the web, and there are no APIs available, then you need to use web scraping! Scrapy is the most effective and popular choice for web scraping and is used in many areas such as data science, journalism, business intelligence, web development, etc.
Abstract
If you want to get data from the web, and there are no APIs available, then you need to use web scraping! Scrapy is the most effective and popular choice for web scraping and is used in many areas such as data science, journalism, business intelligence, web development, etc.
This workshop will provide an overview of Scrapy, starting from the fundamentals and working through each new topic with hands-on examples.
Participants will come away with a good understanding of Scrapy, the principles behind its design, and how to apply the best practices encouraged by Scrapy to any scraping task.
Goals:
Set up a python environment.
Learn basic concepts of the Scrapy framework.
The ultimate guide for Elasticsearch pluginsItamar
Elasticsearch is a great product - for search, for scale, for analyzing data, and much more. But sometimes you need to do something that is not supported by Elasticsearch out of the box, and that's where plugins come into play.
Join me in this talk to explore the plugins land of Elasticsearch. We will discuss the various ways Elasticsearch can be extended, and the various types of plugins available to do that. By giving concrete examples and browsing the large selection of pre-made plugins, we will see how plugins can help us overcome various challenges. We will also discuss possible issues with plugins, and ways to work around them.
Finally, we will discuss scenarios in which custom plugin development is necessary and can really save the day. By showing a demo of one such scenario, and the way we built and debugged a plugin to solve it, we will complete the picture of the Elasticsearch plugin land, and hopefully inspire you to create your own!
Practical Elasticsearch - real world use casesItamar
Elasticsearch - a search and real-time analytics server based on Apache Lucene - is gaining a lot of popularity lately, and is being used world-wide to power many sophisticated systems. While many use it for the "standard" stuff (that is, simple full-text search and real-time log analysis), there are some really interesting usage patterns that can prove useful in many real-world scenarios. In this talk we will briefly talk about Elasticsearch and its common use-cases, and then showcase some less common use-cases leveraging Elasticsearch in an interesting and often times innovating ways.
A brief introduction to Elasticsearch and the many possibilities Elasticsearch offers in terms of search, data exploration and data aggregation. The presentation includes a brief introduction to search engine fundamentals and core features of Elasticsearch. The talk focuses on how we can navigate structured and unstructured data for search as well as aggregating and visualizing data for analytical purposes.
The talk aims to demonstrate case studies beyond traditional full-text-search, and hopefully show that Elasticsearch can help us build so much more than just a search engine.
Elasticsearch Distributed search & analytics on BigData made easyItamar
Elasticsearch is a cloud-ready, super scalable search engine which is gaining a lot of popularity lately. It is mostly known for being extremely easy to setup and integrate with any technology stack.In this talk we will introduce Elasticdearch, and start by looking at some of its basic capabilities. We will demonstrate how it can be used for document search and even log analytics for DevOps and distributed debugging, and peek into more advanced usages like the real-time aggregations and percolation. Obviously, we will make sure to demonstrate how Elasticsearch can be scaled out easily to work on a distributed architecture and handle pretty much any load.
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014gethue
This talk describes how open source Hue was built in order to provide a better Hadoop User Experience. The underlying technical details of its architecture, the lessons learned and how it integrates with Impala, Search and Spark under the cover will be explained.
The presentation continues with real life analytics business use cases. It will show how data can be easily imported into the cluster and then queried interactively with SQL or through a visual search dashboard. All through your Web Browser or your own custom Web application!
This talk aims at organizations trying to put a friendly “face” on Hadoop and get productive. Anybody looking at being more effective with Hadoop will also learn best practices and how to quickly get ramped up on the main data scenarios. Hue can be integrated with existing Hadoop deployments with minimal changes/disturbances. We cover details on how Hue interacts with the ecosystem and leverages the existing authentication and security model of your company.
To sum-up, attendees of this talk will learn how Hadoop can be made more accessible and why Hue is the ideal gateway for using it more efficiently or being the starting point of your own Big Data Web application.
Parse is a suite of cloud based APIs, services and libraries that focus on letting developers build out rich applications and less time dealing with the overhead of setting up and managing databases, push notifications, social sign on, analytics, and even hosting and servers.
In this series I'll overview the options around developing an application that leverages Parse, including using Cloud Code to deploy your Node.js app to Parse's own hosting service.
Presentation from OSDC 2008 in Sydney, on using Sphinx with Rails. The sldes don't work that well without the talk though, but it may be useful as a reference, I guess.
I will show how to use Go's database/sql package, with MySQL as an example. Although the documentation is good, it's dense. I'll discuss idiomatic database/sql code, and cover some topics that can save you time and frustration, and perhaps even prevent serious mistakes.
Description
If you want to get data from the web, and there are no APIs available, then you need to use web scraping! Scrapy is the most effective and popular choice for web scraping and is used in many areas such as data science, journalism, business intelligence, web development, etc.
Abstract
If you want to get data from the web, and there are no APIs available, then you need to use web scraping! Scrapy is the most effective and popular choice for web scraping and is used in many areas such as data science, journalism, business intelligence, web development, etc.
This workshop will provide an overview of Scrapy, starting from the fundamentals and working through each new topic with hands-on examples.
Participants will come away with a good understanding of Scrapy, the principles behind its design, and how to apply the best practices encouraged by Scrapy to any scraping task.
Goals:
Set up a python environment.
Learn basic concepts of the Scrapy framework.
The ultimate guide for Elasticsearch pluginsItamar
Elasticsearch is a great product - for search, for scale, for analyzing data, and much more. But sometimes you need to do something that is not supported by Elasticsearch out of the box, and that's where plugins come into play.
Join me in this talk to explore the plugins land of Elasticsearch. We will discuss the various ways Elasticsearch can be extended, and the various types of plugins available to do that. By giving concrete examples and browsing the large selection of pre-made plugins, we will see how plugins can help us overcome various challenges. We will also discuss possible issues with plugins, and ways to work around them.
Finally, we will discuss scenarios in which custom plugin development is necessary and can really save the day. By showing a demo of one such scenario, and the way we built and debugged a plugin to solve it, we will complete the picture of the Elasticsearch plugin land, and hopefully inspire you to create your own!
Practical Elasticsearch - real world use casesItamar
Elasticsearch - a search and real-time analytics server based on Apache Lucene - is gaining a lot of popularity lately, and is being used world-wide to power many sophisticated systems. While many use it for the "standard" stuff (that is, simple full-text search and real-time log analysis), there are some really interesting usage patterns that can prove useful in many real-world scenarios. In this talk we will briefly talk about Elasticsearch and its common use-cases, and then showcase some less common use-cases leveraging Elasticsearch in an interesting and often times innovating ways.
A brief introduction to Elasticsearch and the many possibilities Elasticsearch offers in terms of search, data exploration and data aggregation. The presentation includes a brief introduction to search engine fundamentals and core features of Elasticsearch. The talk focuses on how we can navigate structured and unstructured data for search as well as aggregating and visualizing data for analytical purposes.
The talk aims to demonstrate case studies beyond traditional full-text-search, and hopefully show that Elasticsearch can help us build so much more than just a search engine.
Elasticsearch Distributed search & analytics on BigData made easyItamar
Elasticsearch is a cloud-ready, super scalable search engine which is gaining a lot of popularity lately. It is mostly known for being extremely easy to setup and integrate with any technology stack.In this talk we will introduce Elasticdearch, and start by looking at some of its basic capabilities. We will demonstrate how it can be used for document search and even log analytics for DevOps and distributed debugging, and peek into more advanced usages like the real-time aggregations and percolation. Obviously, we will make sure to demonstrate how Elasticsearch can be scaled out easily to work on a distributed architecture and handle pretty much any load.
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014gethue
This talk describes how open source Hue was built in order to provide a better Hadoop User Experience. The underlying technical details of its architecture, the lessons learned and how it integrates with Impala, Search and Spark under the cover will be explained.
The presentation continues with real life analytics business use cases. It will show how data can be easily imported into the cluster and then queried interactively with SQL or through a visual search dashboard. All through your Web Browser or your own custom Web application!
This talk aims at organizations trying to put a friendly “face” on Hadoop and get productive. Anybody looking at being more effective with Hadoop will also learn best practices and how to quickly get ramped up on the main data scenarios. Hue can be integrated with existing Hadoop deployments with minimal changes/disturbances. We cover details on how Hue interacts with the ecosystem and leverages the existing authentication and security model of your company.
To sum-up, attendees of this talk will learn how Hadoop can be made more accessible and why Hue is the ideal gateway for using it more efficiently or being the starting point of your own Big Data Web application.
Parse is a suite of cloud based APIs, services and libraries that focus on letting developers build out rich applications and less time dealing with the overhead of setting up and managing databases, push notifications, social sign on, analytics, and even hosting and servers.
In this series I'll overview the options around developing an application that leverages Parse, including using Cloud Code to deploy your Node.js app to Parse's own hosting service.
Presentation from OSDC 2008 in Sydney, on using Sphinx with Rails. The sldes don't work that well without the talk though, but it may be useful as a reference, I guess.
Making your elastic cluster perform - Jettro Coenradie - Codemotion Amsterdam...Codemotion
In the past few years I have helped a lot of customers optimising their elastic cluster. With each version elasticsearch has more options to track performance of your nodes and recently profiling your queries was added. In this talk I am going to discuss the steps you have to take when starting with elasticsearch. The choices you have to make for the size of your cluster, the amount of indexes, amount of shards, choosing the right mappings, and creating better queries. After the setup I'll continue showing how to monitor your cluster and profile your queries.
Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search, and analyze big volumes of data quickly and in near real time.
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...Big Data Spain
This talk describes how open source Hue [1] was built in order to provide a better Hadoop User Experience. The underlying technical details of its architecture, the lessons learned and how it integrates with Impala, Search and Spark under the cover will be explained.
Web APIs have revolutionized all kinds of products and services, and still continue to do so. Nowadays the most relevant architecture is REST along with the JSON media type. Furthermore, lots of specifications to serialize those media types are appearing. JSON API has released its first version last May.
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017Codemotion
Today’s applications are expected to provide powerful full-text search. But how does that work in general and how do I implement it on my site or in my application? Actually, this is not as hard as it sounds at first. This talk covers: * How full-text search works in general and what the differences to databases are. * How the score or quality of a search result is calculated. * How to implement this with Elasticsearch. Attendees will learn how to add common search patterns to their applications without breaking a sweat.
Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018Codemotion
The CAP theorem is widely known for distributed systems, but it's not the only tradeoff you should be aware of. For datastores there is also the FAB theory and just like with the CAP theorem you can only pick two: fast, accurate, big. While Fast and Big are relatively easy to understand, Accurate is a bit harder to picture. This talk shows some concrete examples of accuracy tradeoffs Elasticsearch can take for terms aggregations, cardinality aggregations with HyperLogLog++, and the IDF part of full-text search. Or how to trade some speed or the distribution for more accuracy.
Elasticsearch sur Azure : Make sense of your (BIG) data !Microsoft
Sous licence Apache2, elasticsearch est un moteur de recherche puissant, distribué et scalable. Il fournit également des agrégations en temps réel en fonction de vos besoins. Couplé à Kibana, dashboard générique et hautement personnalisable, il vous permet de donner immédiatement du sens à vos données. En forte progression au niveau de son adhésion par les entreprises et les sites publics, découvrez ce que sont elasticsearch et Kibana et à quel point il est simple de les déployer facilement sur la plate-forme Windows Azure. Thomas et David illustreront à l'aide de cas clients les bénéfices obtenus à travers ces solutions.
Speakers : Thomas Conté (Microsoft), David Pilato (Elasticsearch)
The CAP theorem is widely known for distributed systems, but it's not the only tradeoff you should be aware of. For datastores, there is also the FAB theory and just like with the CAP theorem you can only pick two:
Fast: Results are real-time or near real-time instead of batch-oriented.
Accurate: Answers are exact and don't have a margin of error.
Big: You require horizontal scaling and need to distribute your data.
While Fast and Big are relatively easy to understand, Accurate is a bit harder to picture. This talk shows some concrete examples of accuracy tradeoffs Elasticsearch can take for terms aggregations, cardinality aggregations with HyperLogLog++, and the IDF part of the full-text search. Or how to trade some speed or the distribution for more accuracy.
Finding the right stuff, an intro to Elasticsearch with Ruby/RailsMichael Reinsch
Slides for my introduction to using Elasticsearch with Ruby/Rails, held at Tokyo Rubyist Meetup in October 2015.
In it I'm giving a short overview on how to use Elasticsearch using the official Elasticsearch Gems with a Ruby on Rails project. Starting with a simple example, expanding to multilanguage, multiobject indexes.
Also I'm shortly discussing integration testing and production hosting.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
63. Inverted index
1. “The quick brown fox
jumped over the lazy
dog”
2. “Quick brown foxes
leap over lazy dogs in
summer”
Term Doc_1 Doc_2
-------------------------
Quick | | X
The | X |
brown | X | X
dog | X |
dogs | | X
fox | X |
foxes | | X
in | | X
jumped | X |
lazy | X | X
leap | | X
over | X | X
quick | X |
summer | | X
the | X |
76. Term frequency
How often does the term appear in the field?
The more often, the more relevant.
Inverse document frequency
How often does each term appear in the
index? The more often, the less relevant. T
Field norm
How long is the field? The longer it is, the less
likely it is that words in the field will be
relevant.
85. Development directions:
1. Search API implementation
2. Field Storage API
3. Alternative backends
Available modules:
Elasticsearch
Elasticsearch Connector
Search API elasticsearch
86. Field Storage API implementation
Elasticsearch field storage sandbox by Damien Tournoud
Started in July 2011
87. Field Storage API implementation
Elasticsearch field storage sandbox by Damien Tournoud
Started in July 2011
Elasticsearch EntityFieldQuery sandbox
https://drupal.org/sandbox/asgorobets/2073151