The document describes Lily, a platform for managing and analyzing large datasets in real-time. It handles data storage, indexing, and recommendations. Lily uses distributed processing on technologies like BigTable and Solr to scale across infrastructure. The document outlines Lily's current roadmap and provides a sample schema for modeling book data with fields.
Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...Sirris
Data growth is rapidly surpassing Moore's Law, as data sets are growing increasingly large, hence deriving insights from these large data sets is becoming more and more complex. Lily, a software product made by Outerthought, allows you to store, index and search vast quantities of data. In the next few years, successful business models will be based on monetization of data. Steven Noels will highlight the raison d'être of Lily, discussing challenges that every data-intensive organisation encounters.
Location Intelligence: when Business Intelligence meets Cartography SpagoWorld
Presentation on SpagoBI Location Intelligence by Andrea Gioia, Engineering IT Solution Architect & SpagoBI Consultant , shown at Solutions Linux 2010, within OW2 Conferences.
This document discusses how new technologies like big data and stream computing are driving a transition to smarter computing. It explains that businesses need to adapt to handle massive, growing amounts of data from diverse sources in real-time. IBM offers integrated solutions like InfoSphere Streams to help organizations collect, manage, analyze and gain insights from both traditional and non-traditional data in motion or at rest.
This document discusses using machine learning and MapReduce with Hadoop to perform predictive analysis on health insurance claims data. It proposes extracting data from online forums, searching for correlations between medical keywords and claims, and using support vector regression to build a predictive model. The analysis would be run on Amazon Elastic MapReduce for scalability and cost efficiency. Future work may include additional data sources and model enhancements.
(1) Big Data refers to the large volumes of various types of data that are constantly being generated from numerous sources; (2) Analyzing big data can provide valuable insights and opportunities, but traditional systems are limited in their ability to process large, diverse datasets; (3) IBM offers a big data platform that can integrate, manage, and analyze petabytes of data from many sources using technologies like Hadoop and stream computing. The platform allows organizations to gain insights from all available data in real-time.
Exploring Process Barriers to Release Public Sector Information in Local Gove...Peter Conradie
Conradie, P. & Choenni, S., 2012. Exploring Process Barriers to Release Public Sector Information in Local Government. In 6th International Conference on Theory and Practice of Electronic Governance, Albany. NY. Albany, New York, pp. 5–13.
Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...Sirris
Data growth is rapidly surpassing Moore's Law, as data sets are growing increasingly large, hence deriving insights from these large data sets is becoming more and more complex. Lily, a software product made by Outerthought, allows you to store, index and search vast quantities of data. In the next few years, successful business models will be based on monetization of data. Steven Noels will highlight the raison d'être of Lily, discussing challenges that every data-intensive organisation encounters.
Location Intelligence: when Business Intelligence meets Cartography SpagoWorld
Presentation on SpagoBI Location Intelligence by Andrea Gioia, Engineering IT Solution Architect & SpagoBI Consultant , shown at Solutions Linux 2010, within OW2 Conferences.
This document discusses how new technologies like big data and stream computing are driving a transition to smarter computing. It explains that businesses need to adapt to handle massive, growing amounts of data from diverse sources in real-time. IBM offers integrated solutions like InfoSphere Streams to help organizations collect, manage, analyze and gain insights from both traditional and non-traditional data in motion or at rest.
This document discusses using machine learning and MapReduce with Hadoop to perform predictive analysis on health insurance claims data. It proposes extracting data from online forums, searching for correlations between medical keywords and claims, and using support vector regression to build a predictive model. The analysis would be run on Amazon Elastic MapReduce for scalability and cost efficiency. Future work may include additional data sources and model enhancements.
(1) Big Data refers to the large volumes of various types of data that are constantly being generated from numerous sources; (2) Analyzing big data can provide valuable insights and opportunities, but traditional systems are limited in their ability to process large, diverse datasets; (3) IBM offers a big data platform that can integrate, manage, and analyze petabytes of data from many sources using technologies like Hadoop and stream computing. The platform allows organizations to gain insights from all available data in real-time.
Exploring Process Barriers to Release Public Sector Information in Local Gove...Peter Conradie
Conradie, P. & Choenni, S., 2012. Exploring Process Barriers to Release Public Sector Information in Local Government. In 6th International Conference on Theory and Practice of Electronic Governance, Albany. NY. Albany, New York, pp. 5–13.
The webinar discusses Lily, a smart Big Data solution. It provides an overview of opportunities and challenges of Big Data, and how Lily addresses these through its flexible storage and scaling architecture. Use cases are presented for Lily in media, education and retail companies. An adoption program is described to help companies explore, adopt and deploy Lily.
The document discusses Outerthought's vision and strategy for addressing the growing amount of digital content and user data. Their mission is to become the premier provider of content application technologies for the emerging "content as opportunity" age. They are developing technologies like Lily, a NoSQL content repository, and frameworks to help clients capture, process, and extract knowledge from large amounts of user data at scale. Their partnership strategy involves collaborating with technology companies, domain experts, and businesses to develop customized solutions and mutually support an open software platform.
The document discusses the Lily project, which aims to provide smart data at scale. It describes Lily's architecture and components, including using HBase for storage, Solr for indexing, and a real-time data processing engine. It outlines Lily's roadmap, which includes releasing version 1.0 in April 2011 and adding real-time analytics and data insights.
The document discusses the challenges of managing large-scale data and the need for real-time analytics. It proposes an integrated approach called Lily that can store all data, perform real-time processing, and provide insights by combining the data with domain knowledge. This moves beyond current batch processing methods to enable interactive use of data and instant feedback. Lily aims to help organizations maximize the value of the data they collect.
Hadoop World 2011: Lily: Smart Data at Scale, Made EasyCloudera, Inc.
Lily is a repository made for the age of Data, and combines CDH, HBase and Solr in a powerful, high-level, developer-friendly backing store for content-centric application with ambition to scale. In this session, we highlight why we choose HBase as the foundation for Lily, and how Lily will allow users to not only store, index and search vast quantities of data, but also to track audience behaviour and generate recommendations, all in real-time.
This document provides an overview of NoSQL and Hadoop technologies. It discusses the trends driving these technologies like increasing data size, connectivity of data, semi-structured data, and decoupled service architectures. It introduces concepts from academic research like Amazon Dynamo, Google BigTable, and Brewer's CAP theorem. Specific technologies are explained like Hadoop for processing large datasets using MapReduce on the Hadoop Distributed File System.
The document discusses the rise of big data and NoSQL databases. It notes that organizations are drowning in large amounts of data from various sources like user-generated content. However, traditional relational databases struggle to handle this type and volume of semi-structured data in a distributed, scalable manner. This has led to the emergence of NoSQL databases that are more flexible and better suited for the distributed, large-scale requirements of big data.
The document discusses open data and the Open Data Institute (ODI). It provides examples of open data success like Google, Wikipedia, and websites. It defines open data as making data available to create new uses, markets, and value while improving quality and lowering costs. An example is shown where open prescribing data in the UK could save over £1.5 billion. The ODI mission is outlined as catalyzing open data culture to create economic, environmental, and social value. It works to unlock supply and demand of data through various programs and training.
Lily for the Bay Area HBase UG - NYC editionNGDATA
The document discusses Lily, an open source content application developed by Outerthought that uses HBase for scalable storage and SOLR for search. It provides a high-level overview of Lily's architecture, which maps content to HBase, indexes it in SOLR, and uses a queue implemented on HBase to connect updates between the systems. Future plans for Lily include a 1.0 release with additional features like user management and a UI framework.
The document summarizes research in business analytics conducted at Gradiant, a Galician research center. Key areas of expertise include data/graph mining, big data and business intelligence technologies. Gradiant has hardware and software resources for these areas and is involved in various national and international projects applying business analytics to domains like emergency healthcare, marketing, and telecommunications network analysis. Gradiant also develops related intellectual property and publishes research results.
My keynote talk at San Diego Superdata conference, looking at history and current state of Analytics and Data Mining, and examining the effects of Big Data
KVIV / NoSQL : the new generation of database serversNGDATA
presentation for KVIV on 2010/06/03
as usual with my presentations: if you we're not there, you missed half of the fun as some of the more important ideas are not in the slides
Learn how Ocean Protocol can be used to further scientific research. A presentation by Ocean's Lead Data Scientist Marcus Jones at Blockchain for Science Conference in Berlin on November 3, 2019.
Building a CMS on top of NoSQL (for ParisJUG)NGDATA
The document discusses building a content management system (CMS) using NoSQL technologies like HBase. It describes some of the scaling challenges faced with the traditional CMS architecture. These include issues with caching, access control computations, and data merging across different data stores. It explores using a database like HBase that can scale out through horizontal partitioning and replication to address these problems. Key requirements for the NoSQL database are also outlined.
1) The document discusses predictive analytics and business intelligence in payments and commerce. It outlines models like Type 0 and Type A analytics engagement and critical success factors.
2) Mobile technology is converging and will be a key driver of growth, with payments initiated by mobile expected to outnumber plastic card transactions within 10 years.
3) The future of payments and business is focused on predictive, accessible analytics solutions and managing mobile and technology convergence.
In many countries across the world, discussions, policies and developments are actively emerging around Open Data. It is believed that opening government data can have significant economic potential, generating new industries and innovations.
This presentation provides a short introduction to Open Data, with a focus on available public data from the National Institute of Statistics of Rwanda (NISR) - http://statistics.gov.rw. The presentation also explore the roles played by the group of actors involved in strengthening the Open Data ecosystem in Rwanda and the steps involved in monetizing open data.
W-JAX Keynote - Big Data and Corporate Evolutionjstogdill
A look at corporate evolution from the industrial revolution to the information age - with a focus on how Big Data will make an impact.
Presented at W-JAX Java Conference in Munich Germany, 11-8-11
This document summarizes a presentation about establishing a new research center called FIDES to focus on applying telecommunications research to financial information systems. It discusses challenges in the financial services industry and how the center aims to address issues like lack of standards, compliance, and resource optimization through interdisciplinary projects. The presentation outlines goals like developing specialized software, frameworks and automated mechanisms to improve performance and compliance. It proposes that the center will be industry-driven with sponsorship and guidance from an advisory board with expertise in various financial domains.
NGDATA brings big data technology and machine intelligence together, allowing organizations to capitalize on the massive amounts of data that is generated today. NGDATA develops Lily, a big data management platform that offers an easy way to extract powerful business insights in real-time and benefit from enriched data to make an immediate impact on business performance. NGDATA's global partner community provides expert services best suited to meet evolving big data needs. NGDATA is a privately-held company with headquarters in Ghent, Belgium. More information and recent updates are available at www.ngdata.com.
The webinar discusses Lily, a smart Big Data solution. It provides an overview of opportunities and challenges of Big Data, and how Lily addresses these through its flexible storage and scaling architecture. Use cases are presented for Lily in media, education and retail companies. An adoption program is described to help companies explore, adopt and deploy Lily.
The document discusses Outerthought's vision and strategy for addressing the growing amount of digital content and user data. Their mission is to become the premier provider of content application technologies for the emerging "content as opportunity" age. They are developing technologies like Lily, a NoSQL content repository, and frameworks to help clients capture, process, and extract knowledge from large amounts of user data at scale. Their partnership strategy involves collaborating with technology companies, domain experts, and businesses to develop customized solutions and mutually support an open software platform.
The document discusses the Lily project, which aims to provide smart data at scale. It describes Lily's architecture and components, including using HBase for storage, Solr for indexing, and a real-time data processing engine. It outlines Lily's roadmap, which includes releasing version 1.0 in April 2011 and adding real-time analytics and data insights.
The document discusses the challenges of managing large-scale data and the need for real-time analytics. It proposes an integrated approach called Lily that can store all data, perform real-time processing, and provide insights by combining the data with domain knowledge. This moves beyond current batch processing methods to enable interactive use of data and instant feedback. Lily aims to help organizations maximize the value of the data they collect.
Hadoop World 2011: Lily: Smart Data at Scale, Made EasyCloudera, Inc.
Lily is a repository made for the age of Data, and combines CDH, HBase and Solr in a powerful, high-level, developer-friendly backing store for content-centric application with ambition to scale. In this session, we highlight why we choose HBase as the foundation for Lily, and how Lily will allow users to not only store, index and search vast quantities of data, but also to track audience behaviour and generate recommendations, all in real-time.
This document provides an overview of NoSQL and Hadoop technologies. It discusses the trends driving these technologies like increasing data size, connectivity of data, semi-structured data, and decoupled service architectures. It introduces concepts from academic research like Amazon Dynamo, Google BigTable, and Brewer's CAP theorem. Specific technologies are explained like Hadoop for processing large datasets using MapReduce on the Hadoop Distributed File System.
The document discusses the rise of big data and NoSQL databases. It notes that organizations are drowning in large amounts of data from various sources like user-generated content. However, traditional relational databases struggle to handle this type and volume of semi-structured data in a distributed, scalable manner. This has led to the emergence of NoSQL databases that are more flexible and better suited for the distributed, large-scale requirements of big data.
The document discusses open data and the Open Data Institute (ODI). It provides examples of open data success like Google, Wikipedia, and websites. It defines open data as making data available to create new uses, markets, and value while improving quality and lowering costs. An example is shown where open prescribing data in the UK could save over £1.5 billion. The ODI mission is outlined as catalyzing open data culture to create economic, environmental, and social value. It works to unlock supply and demand of data through various programs and training.
Lily for the Bay Area HBase UG - NYC editionNGDATA
The document discusses Lily, an open source content application developed by Outerthought that uses HBase for scalable storage and SOLR for search. It provides a high-level overview of Lily's architecture, which maps content to HBase, indexes it in SOLR, and uses a queue implemented on HBase to connect updates between the systems. Future plans for Lily include a 1.0 release with additional features like user management and a UI framework.
The document summarizes research in business analytics conducted at Gradiant, a Galician research center. Key areas of expertise include data/graph mining, big data and business intelligence technologies. Gradiant has hardware and software resources for these areas and is involved in various national and international projects applying business analytics to domains like emergency healthcare, marketing, and telecommunications network analysis. Gradiant also develops related intellectual property and publishes research results.
My keynote talk at San Diego Superdata conference, looking at history and current state of Analytics and Data Mining, and examining the effects of Big Data
KVIV / NoSQL : the new generation of database serversNGDATA
presentation for KVIV on 2010/06/03
as usual with my presentations: if you we're not there, you missed half of the fun as some of the more important ideas are not in the slides
Learn how Ocean Protocol can be used to further scientific research. A presentation by Ocean's Lead Data Scientist Marcus Jones at Blockchain for Science Conference in Berlin on November 3, 2019.
Building a CMS on top of NoSQL (for ParisJUG)NGDATA
The document discusses building a content management system (CMS) using NoSQL technologies like HBase. It describes some of the scaling challenges faced with the traditional CMS architecture. These include issues with caching, access control computations, and data merging across different data stores. It explores using a database like HBase that can scale out through horizontal partitioning and replication to address these problems. Key requirements for the NoSQL database are also outlined.
1) The document discusses predictive analytics and business intelligence in payments and commerce. It outlines models like Type 0 and Type A analytics engagement and critical success factors.
2) Mobile technology is converging and will be a key driver of growth, with payments initiated by mobile expected to outnumber plastic card transactions within 10 years.
3) The future of payments and business is focused on predictive, accessible analytics solutions and managing mobile and technology convergence.
In many countries across the world, discussions, policies and developments are actively emerging around Open Data. It is believed that opening government data can have significant economic potential, generating new industries and innovations.
This presentation provides a short introduction to Open Data, with a focus on available public data from the National Institute of Statistics of Rwanda (NISR) - http://statistics.gov.rw. The presentation also explore the roles played by the group of actors involved in strengthening the Open Data ecosystem in Rwanda and the steps involved in monetizing open data.
W-JAX Keynote - Big Data and Corporate Evolutionjstogdill
A look at corporate evolution from the industrial revolution to the information age - with a focus on how Big Data will make an impact.
Presented at W-JAX Java Conference in Munich Germany, 11-8-11
This document summarizes a presentation about establishing a new research center called FIDES to focus on applying telecommunications research to financial information systems. It discusses challenges in the financial services industry and how the center aims to address issues like lack of standards, compliance, and resource optimization through interdisciplinary projects. The presentation outlines goals like developing specialized software, frameworks and automated mechanisms to improve performance and compliance. It proposes that the center will be industry-driven with sponsorship and guidance from an advisory board with expertise in various financial domains.
Similar to From Content Storage to Scaling Smart Data (20)
NGDATA brings big data technology and machine intelligence together, allowing organizations to capitalize on the massive amounts of data that is generated today. NGDATA develops Lily, a big data management platform that offers an easy way to extract powerful business insights in real-time and benefit from enriched data to make an immediate impact on business performance. NGDATA's global partner community provides expert services best suited to meet evolving big data needs. NGDATA is a privately-held company with headquarters in Ghent, Belgium. More information and recent updates are available at www.ngdata.com.
The document repeatedly lists an address in Zwijnaarde, Belgium and a website. It provides no other context or information beyond stating "Two things more..." in the final line.
Devoxx 2010 | Tools In Action : Kauri and LilyNGDATA
Introduction to our web-scale tools for "rest-full app development" and "bigdata store and search" (lily)
Slides presented at devoxx-2010
http://devoxx.com/display/Devoxx2K10/Scalable+and+RESTful+web+applications++at+the+crossroads+of+Kauri+and+Lily
Devoxx 2010 | Tools In Action : Kauri and LilyNGDATA
Introduction to our web-scale tools for "rest-full app development" and "bigdata store and search" (lily)
Slides presented at devoxx-2010
http://devoxx.com/display/Devoxx2K10/Scalable+and+RESTful+web+applications++at+the+crossroads+of+Kauri+and+Lily
Introduction to rest-full development with Java. Focussing on jax-rs (and kauriproject.org)
Slides presented at devoxx-2010.
See http://devoxx.com/display/Devoxx2K10/RESTful+development+with+Java for abstract.
N-O-SQL, new database technologies on the riseNGDATA
The document discusses new database technologies called NoSQL or non-relational databases that are gaining popularity. It provides an overview of the reasons for the rise of these technologies, including the need to scale databases for large amounts of data and high user volumes. It also discusses some of the core concepts behind NoSQL databases like document stores, key-value stores, and column-oriented databases.
1) The document discusses NoSQL databases and provides advice on choosing a platform, hardware requirements, backup/replication strategies, and common issues like bottlenecks and consistency.
2) It recommends analyzing your problem and understanding the benefits of your chosen platform, as well as considering the CAP theorem.
3) Contact information is provided for several NoSQL experts on Twitter and mailing lists to stay up to date on new developments.
HBase is a distributed, column-oriented database that provides random access reads and writes on top of HDFS. It uses a multi-dimensional key-value data model where keys are composed of a table, row, column family, column qualifier, and timestamp. Column families allow for locality of storage and efficient access. Data is stored at the intersection of row keys and column families/qualifiers, which is sometimes called a "cell". HBase can be used as a normal datastore with static column qualifiers or in more advanced ways by using dynamic qualifiers to build secondary indexes or embed data in the qualifier.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
National Security Agency - NSA mobile device best practices
From Content Storage to Scaling Smart Data
1. Smart data,
Lily at scale
madE easy
from content storage
to scaling smart data
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
maandag 6 juni 2011
3. the pain
data
need for
distributed
processing
moore
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 3
maandag 6 juni 2011
4. the pain
» growth of data sets
» smart businesses need
to apply analytics to Smart data,
activities
at scale
» doing business online
means real-time
madE easy
» talent shortage
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 4
maandag 6 juni 2011
5. LILY
The Real-time Platform built for the Age of Data.
We manage, track and measure your data and users,
and do the mat(c)hmaking in-between:
» provide you with business intelligence and analytics
» harvest user profiles and learn their interests
» dynamically engage your users using quality recommendations
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 5
maandag 6 juni 2011
6. where would you use lily?
» large collections of data » large groups of users
» content repositories » e-commerce / retail
» library catalogs » news / media
» (media) asset management
» product catalogs
» ‘live’ archives
» ... if you want to use big
data, but you need easy.
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 6
maandag 6 juni 2011
7. ns
pe
ap
gic h
ma
he
t
re
he
sw
si
+
thi
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 7
maandag 6 juni 2011
9. beyond content management: data + analytics
recommendations
call to action
personalised
revenue
product / service
audience data
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 9
maandag 6 juni 2011
10. LILY 2.0: smart data
SMARTER DATA data processing
s
relation
recommendations
semantic augmentation
Analytics
usage
metrics domain
knowledge
patterns
rules
keywords
lists
...
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 10
maandag 6 juni 2011
11. roadmap
» now: highly-scalable data repository: store, index and search
» next: with real-time usage stats gathering and analytics
» later: and built-in context- and user-sensitive
recommendations
» built on top of Google BigTable / HBase / Solr
» identical, robust technology in use at Facebook, Twitter,
StumbleUpon, Yahoo!
» scales widely over distributed (cloud) infrastructure
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 11
maandag 6 juni 2011
12. Lily Repository Model
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 12
maandag 6 juni 2011
16. HBase indexing & RowLog Library
» building and querying » need for sync/async
indexes, GAE-style operations
» updating of secondary indexes
rowkey col col
content
A val3 foo6 (e.g. link tables)
table
B val2 foo7
» feeding of Indexer
(= indexes Lily-content into Solr)
rowkey col » not: transactions
order
index
table A val2-B
val3-A » need for distribution and
durability
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 16
maandag 6 juni 2011
17. The Lily Indexer
sharding towards
indexing of multiple incremental index blob content
denormalization batch index building multiple SOLR
versions of a record updating extraction
instances
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 17
maandag 6 juni 2011
18. status june 2011
» Lily 1.0.1 released - developing since Q4/09
» some customers - DIY retail / media / news
» e-commerce platform project
» Lily as the data (integration) tier
» first contrib: FrogPond (annotated Java <> Lily mapper)
https://bitbucket.org/calmera/frogpond
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 18
maandag 6 juni 2011
19. Next up: usage stats
» sits in CRUD-path
» tracks users ops against
records
interactions
» from both perspectives
record user
» arbitrary K/V properties: time,
location, ...
rec
» automatically builds user
om
me
nd
ati
o
profiles (as records)
ns
indexes
e
tim
» tied to records ops
» indexed access
» time dimension: trending
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 19
maandag 6 juni 2011
20. from usage stats to recommendations ‘light’
record user
» grouping of users based on
» shared properties
» shared record access
» grouping of records based on
» shared properties
{ connections
» shared user operations recommendations
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 20
maandag 6 juni 2011
21. full-on recommendations
» look at real-time-capable Mahout algorithms
» pre-index or -calculate as much as possible
» save as secondary indexes
» present recommendations as part of record API
» allow user to contribute ‘domain knowledge’ to
record processing pipeline
» pattern detection, keywords, ontologies, ...
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 21
maandag 6 juni 2011
26. Thank you !
for your attention
for your questions
» stevenn@outerthought.org
» @stevenn
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
maandag 6 juni 2011