Welcome to the Age of Data

•

3 likes•788 views

NGDATA

An introductory presentation on Big Data and Hadoop for bigdate.be - presented 11/Jan/2012 at Accenture (Brussels).

Technology Business

Welcome to the age of data!
BIGDATA.BE

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

who am i

» Steven Noels
» Founder & VP Product
» Makers of Lily: Interactive Big Data
platform
» Open Source / Apache Software
Foundation
» co-founder bigdata.be

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 2

Houston,
we have
a problem.
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

We’re
drowning.

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Drowning
in a
Sea
of
Data.
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Mountains of
Metadata.
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

The firehose
of UGC.
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Still, we
can’t make
much sense
of it.

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

... and we
throw a lot of
it away.
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

We regard
DATA as cost.

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

But data is an
opportunity.

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Think about it.

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

advertisements
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 13

recommendations
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 14

fraud detection
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 15

eyeballs
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 16

churn
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 17

The future is
for
datanerds.
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

This is what Big
Data is about:
new insights,
new business.
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

3 issues for
BIG DATA
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

volume
need:
more
capacity
data

moore
1
time

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 21

solution:
distributed
systems 1
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 22

1
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 23

distributed
systems are
1
hard.
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

2
database

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 25

2
database data warehouse

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 25

2
database data warehouse analytics

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 25

data shuffling, data duplication
2
database data warehouse analytics

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 25

“Top-performing
organizations are twice
as likely to apply
analytics to activities.”
3
(MIT Sloan Management
Review, Winter 2011)

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 26

enter
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 27

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 28

HBase

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 28

what is hadoop ?

1 server RAM
CPU
Disk

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 29

RAM HBASE

CPU MAP/REDUCE

DISK HDFS

many servers
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 30

map/reduce

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 31

map/reduce

» Batch-oriented
» Data locality (code is shipped around)
» Heavy parallellization
» Process management
» Append-only ﬁles

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 32

Hadoop ecosystem
» Hadoop Common » Hive: A data warehouse infrastructure

» Subprojects that provides data summarization and
ad hoc querying.
» Flume/SQOOP: Data collection systems
» MapReduce: A software framework for
for large distributed systems.
distributed processing of large data
» HBase: A scalable, distributed database sets on compute clusters.
that supports structured data storage
» Pig: A high-level data-ﬂow language
for large/wide tables.
and execution framework for parallel
» HDFS: A distributed ﬁle system that computation.
provides high throughput access to
» ZooKeeper: A high-performance
application data.
coordination service for distributed
applications.
» Mahout: machine learning libraries

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 33

High-level data model / easy API indexes

UI Framework SDK
(HUE) (HUE SDK)

Search
Dev2Dev
Workflow Scheduling Metadata tutoring,
(OOZIE) (oozie) (HIVE) integrated
deployment
and
Languages / enterprise
Data Compilers Fast usage metrics, support
Integration (PIG, HIVE) Read/Write analytics &
(FLUME, Access recommen-
SQOOP) (HBASE) dations
(PIG, HIVE)

Coordination
(ZOOKEEPER)

CDH
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 34

real-time big data architecture

1. compensate for high latency of updates to serving layer
speed layer 2. fast, incremental algorithms
3. batch layer eventually overrides speed layer
storm

1. random access to batch views
serving layer 2. updated by batch layer

1. store master dataset (append-only)
batch layer 2. compute arbitrary views

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 35

Hadoop, interactive.

Analytics Interactics (RDBMS)
batch interactive
static ﬁles data management

1018 1015 109-12

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 36

news & media

smart data management
insights indexing
search
commerce

ﬁnance
interactive
audience proﬁle
metrics harvesting

telecom

My baby: Lily.

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 37

The start of Lily.

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 38

Thank you !
for your attention
for your questions

» steven.noels@outerthought.com

» @stevenn

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Lily is a repository made for the age of Data, and combines CDH, HBase and Solr in a powerful, high-level, developer-friendly backing store for content-centric application with ambition to scale. In this session, we highlight why we choose HBase as the foundation for Lily, and how Lily will allow users to not only store, index and search vast quantities of data, but also to track audience behaviour and generate recommendations, all in real-time.

Lily @ Work Webinar

NGDATA

KVIV / NoSQL : the new generation of database servers

NGDATA

Learning Lessons: Building a CMS on top of NoSQL technologies

NGDATA

NoSQL with Hadoop and HBase

NGDATA

This document provides an overview of NoSQL and Hadoop technologies. It discusses the trends driving these technologies like increasing data size, connectivity of data, semi-structured data, and decoupled service architectures. It introduces concepts from academic research like Amazon Dynamo, Google BigTable, and Brewer's CAP theorem. Specific technologies are explained like Hadoop for processing large datasets using MapReduce on the Hadoop Distributed File System.

GLORIAD's New Measurement and Monitoring System

Ed Dodds

GLORIAD has developed a new system for measuring and monitoring global network infrastructure that focuses on individual customers rather than links. The new system collects and analyzes 200-400 million network records per day using open-source Argus software. It aims to (1) understand network utilization of individual customers, (2) identify poor application performance in near real-time, (3) mitigate poor performance by identifying fabric weaknesses, and (4) provide rich visualization tools. GLORIAD has transitioned from its previous netflow-based system to the new Argus-based system to realize this new focus on individual customers. The presentation provides details on GLORIAD's new measurement and monitoring approach and tools.

Spark 2013-04-17

michaelmalak

The document discusses the Spark ecosystem. It provides an overview of Spark, a cluster computing framework developed at UC Berkeley, including its core components like Resilient Distributed Datasets (RDDs) and projects like Shark. Spark aims to improve on Hadoop and MapReduce by allowing more interactive queries and streaming data analysis through its use of RDDs to cache data in memory across clusters.

Big Data Tools : PAST, NOW and FUTURE

Jazz Yao-Tsung Wang

The document discusses big data tools of the past, present, and future. It summarizes three types of big data processing tools and challenges for the future. First, it discusses Hadoop and MapReduce frameworks which were used for large-scale batch processing of static "data at rest." Second, it covers current in-memory tools like HBase that can process "data in motion" in real-time. Third, it mentions streaming data collection tools like Storm and Kafka. It concludes that future big data architectures will require hybrid approaches and addresses big data security as an important issue going forward.

The document discusses new database technologies called NoSQL or non-relational databases that are gaining popularity. It provides an overview of the reasons for the rise of these technologies, including the need to scale databases for large amounts of data and high user volumes. It also discusses some of the core concepts behind NoSQL databases like document stores, key-value stores, and column-oriented databases.

Lily for the Bay Area HBase UG - NYC edition

NGDATA

The document discusses Lily, an open source content application developed by Outerthought that uses HBase for scalable storage and SOLR for search. It provides a high-level overview of Lily's architecture, which maps content to HBase, indexes it in SOLR, and uses a queue implemented on HBase to connect updates between the systems. Future plans for Lily include a 1.0 release with additional features like user management and a UI framework.

Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...

Sirris

Data growth is rapidly surpassing Moore's Law, as data sets are growing increasingly large, hence deriving insights from these large data sets is becoming more and more complex. Lily, a software product made by Outerthought, allows you to store, index and search vast quantities of data. In the next few years, successful business models will be based on monetization of data. Steven Noels will highlight the raison d'être of Lily, discussing challenges that every data-intensive organisation encounters.

Outerthought / Lily Partnerships

NGDATA

The document discusses Outerthought's vision and strategy for addressing the growing amount of digital content and user data. Their mission is to become the premier provider of content application technologies for the emerging "content as opportunity" age. They are developing technologies like Lily, a NoSQL content repository, and frameworks to help clients capture, process, and extract knowledge from large amounts of user data at scale. Their partnership strategy involves collaborating with technology companies, domain experts, and businesses to develop customized solutions and mutually support an open software platform.

Building a CMS on top of NoSQL (for ParisJUG)

NGDATA

The document discusses building a content management system (CMS) using NoSQL technologies like HBase. It describes some of the scaling challenges faced with the traditional CMS architecture. These include issues with caching, access control computations, and data merging across different data stores. It explores using a database like HBase that can scale out through horizontal partitioning and replication to address these problems. Key requirements for the NoSQL database are also outlined.

NoSQL intro for YaJUG / NoSQL UG Luxembourg

NGDATA

The document discusses the rise of big data and NoSQL databases. It notes that organizations are drowning in large amounts of data from various sources like user-generated content. However, traditional relational databases struggle to handle this type and volume of semi-structured data in a distributed, scalable manner. This has led to the emergence of NoSQL databases that are more flexible and better suited for the distributed, large-scale requirements of big data.

Devoxx 2010 | LAB : ReST in Java

NGDATA

The Lily RowLog library

NGDATA

Devoxx 2010 | Tools In Action : Kauri and Lily

NGDATA

From Content Storage to Scaling Smart Data

NGDATA

Lily at HUG UK

NGDATA

Huguk lily

Skills Matter

The document discusses the challenges of managing large-scale data and the need for real-time analytics. It proposes an integrated approach called Lily that can store all data, perform real-time processing, and provide insights by combining the data with domain knowledge. This moves beyond current batch processing methods to enable interactive use of data and instant feedback. Lily aims to help organizations maximize the value of the data they collect.

The world is the computer and the programmer is you

Davide Carboni

This document discusses the past, present, and future of connecting physical objects to the internet and computing networks. It outlines the evolution of related technologies over time from the 1950s to present. It also describes two approaches to programming these connected systems - a top-down approach using tools like PySense, and a bottom-up approach using a model called Hyperpipe that is based on pi-calculus.

Cloud applications

jamiehannaford

The document discusses building cloud-ready applications. It outlines limitations of traditional hosting and how cloud computing addresses these through scaling, flexibility, and automation. It promotes a "pets vs cattle" philosophy where infrastructure is treated as standardized resources rather than individual machines. It also emphasizes that monolithic applications need restructuring to follow "12 factor principles" and integrate with cloud architectures through loose coupling, configuration as code, and other best practices.

Afterwork big data et data viz - du lac à votre écran

Joseph Glorieux

This document discusses a data visualization workshop hosted by OCTOSuisse on exploring and visualizing big data from a data lake. It provides an overview of OCTO's big data capabilities and projects. It then uses a case study of Swiss public transportation data to demonstrate data exploration, analysis, and visualization techniques using tools like Tableau. The goal is to understand data, identify insights, and effectively communicate findings to others.

SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...

Henry Muccini

SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...

SERENEWorkshop

Possibilities of generative models

Alison B. Lowndes

big data et data viz - du lac à votre écran - afterwork

OCTO Technology Suisse

This document discusses a presentation on big data and data visualization from lake to screen. It covers exploring data in a data lake using tools like Tableau and Jupyter notebooks. Models can be built to predict things like train delays. Visualizations are then created using technologies like D3.js to communicate insights from the data and models. The goal is to extract value from large, raw data sources through the entire data science process from exploration to communication.

VoltDB on SolftLayer Cloud

SkylabReddy Vanga

VoltDB is a high performance database for real-time analytics that can be deployed on SoftLayer cloud infrastructure. The document outlines the process to install and run VoltDB on SoftLayer, including unpacking the VoltDB distribution, installing Java, exporting the VoltDB binaries to the path, and running VoltDB using the run.sh script. It also discusses how VoltDB enables real-time analytics by ingesting and exporting data to Netezza for deeper historical analysis in a closed loop system.

NGDATA Corporate Presentation

NGDATA

NGDATA brings big data technology and machine intelligence together, allowing organizations to capitalize on the massive amounts of data that is generated today. NGDATA develops Lily, a big data management platform that offers an easy way to extract powerful business insights in real-time and benefit from enriched data to make an immediate impact on business performance. NGDATA's global partner community provides expert services best suited to meet evolving big data needs. NGDATA is a privately-held company with headquarters in Ghent, Belgium. More information and recent updates are available at www.ngdata.com.

20110514 appsforghent

NGDATA

Similar to Welcome to the Age of Data

N-O-SQL, new database technologies on the rise

NGDATA

Lily for the Bay Area HBase UG - NYC edition

NGDATA

Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...

Sirris

Outerthought / Lily Partnerships

NGDATA

Building a CMS on top of NoSQL (for ParisJUG)

NGDATA

NoSQL intro for YaJUG / NoSQL UG Luxembourg

NGDATA

Devoxx 2010 | LAB : ReST in Java

NGDATA

The Lily RowLog library

NGDATA

Devoxx 2010 | Tools In Action : Kauri and Lily

NGDATA

From Content Storage to Scaling Smart Data

NGDATA

Lily at HUG UK

NGDATA

Huguk lily

Skills Matter

The world is the computer and the programmer is you

Davide Carboni

Cloud applications

jamiehannaford

Afterwork big data et data viz - du lac à votre écran

Joseph Glorieux

SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...

Henry Muccini

SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...

SERENEWorkshop

Possibilities of generative models

Alison B. Lowndes

big data et data viz - du lac à votre écran - afterwork

OCTO Technology Suisse

VoltDB on SolftLayer Cloud

SkylabReddy Vanga

Similar to Welcome to the Age of Data (20)

N-O-SQL, new database technologies on the rise

Lily for the Bay Area HBase UG - NYC edition

Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...

Outerthought / Lily Partnerships

Building a CMS on top of NoSQL (for ParisJUG)

NoSQL intro for YaJUG / NoSQL UG Luxembourg

Devoxx 2010 | LAB : ReST in Java

The Lily RowLog library

Devoxx 2010 | Tools In Action : Kauri and Lily

From Content Storage to Scaling Smart Data

Lily at HUG UK

Huguk lily

The world is the computer and the programmer is you

Cloud applications

Afterwork big data et data viz - du lac à votre écran

SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...

Possibilities of generative models

big data et data viz - du lac à votre écran - afterwork

VoltDB on SolftLayer Cloud

More from NGDATA

NGDATA Corporate Presentation

NGDATA

20110514 appsforghent

NGDATA

Big Data

NGDATA

Devoxx 2010 | Tools In Action : Kauri and Lily

NGDATA

NoSQL BOF at Devoxx

NGDATA

1) The document discusses NoSQL databases and provides advice on choosing a platform, hardware requirements, backup/replication strategies, and common issues like bottlenecks and consistency. 2) It recommends analyzing your problem and understanding the benefits of your chosen platform, as well as considering the CAP theorem. 3) Contact information is provided for several NoSQL experts on Twitter and mailing lists to stay up to date on new developments.

NoSQL "Tools in Action" talk at Devoxx

NGDATA

HBase is a distributed, column-oriented database that provides random access reads and writes on top of HDFS. It uses a multi-dimensional key-value data model where keys are composed of a table, row, column family, column qualifier, and timestamp. Column families allow for locality of storage and efficient access. Data is stored at the intersection of row keys and column families/qualifiers, which is sometimes called a "cell". HBase can be used as a normal datastore with static column qualifiers or in more advanced ways by using dynamic qualifiers to build secondary indexes or embed data in the qualifier.

More from NGDATA (6)

NGDATA Corporate Presentation

20110514 appsforghent

Big Data

Devoxx 2010 | Tools In Action : Kauri and Lily

NoSQL BOF at Devoxx

NoSQL "Tools in Action" talk at Devoxx

Recently uploaded

Monitoring Java Application Security with JDK Tools and JFR Events

Ana-Maria Mihalceanu

Introduction to CHERI technology - Cybersecurity

mikeeftimakis1

Essentials of Automations: The Art of Triggers and Actions in FME

Safe Software

In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation. We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios. Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!

20 Comprehensive Checklist of Designing and Developing a Website

Pixlogix Infotech

Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.

Large Language Model (LLM) and it’s Geospatial Applications

Rohit Gautam

How to use Firebase Data Connect For Flutter

Daiki Mogmet Ito

Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf

Malak Abu Hammad

Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers: * What is Vector Search? * Importance and benefits of vector search * Practical use cases across various industries * Step-by-step implementation guide * Live demos with code snippets * Enhancing LLM capabilities with vector search * Best practices and optimization strategies Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications. #MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology

Presentation of the OECD Artificial Intelligence Review of Germany

innovationoecd

“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/ Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit. In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing. van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.

Uni Systems Copilot event_05062024_C.Vlachos.pdf

Uni Systems S.M.S.A.

Video Streaming: Then, Now, and in the Future

Alpen-Adria-Universität

In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.

Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...

Zilliz

Climate Impact of Software Testing at Nordic Testing Days

Kari Kakkonen

My slides at Nordic Testing Days 6.6.2024 Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.

“I’m still / I’m still / Chaining from the Block”

Claudio Di Ciccio

Removing Uninteresting Bytes in Software Fuzzing

Aftab Hussain

Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process. In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds. - These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.

20240607 QFM018 Elixir Reading List May 2024

Matthew Sinclair

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...

Neo4j

Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.

GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024

Neo4j

Pushing the limits of ePRTC: 100ns holdover for 100 days

Adtran

TrustArc Webinar - 2024 Global Privacy Survey

TrustArc

How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024? In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores. See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe. This webinar will review: - The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey - The top challenges for privacy leaders, practitioners, and organizations in 2024 - Key themes to consider in developing and maintaining your privacy program

Recently uploaded (20)

Monitoring Java Application Security with JDK Tools and JFR Events

Introduction to CHERI technology - Cybersecurity

Essentials of Automations: The Art of Triggers and Actions in FME

20 Comprehensive Checklist of Designing and Developing a Website

Large Language Model (LLM) and it’s Geospatial Applications

How to use Firebase Data Connect For Flutter

Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf

Presentation of the OECD Artificial Intelligence Review of Germany

“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...

Uni Systems Copilot event_05062024_C.Vlachos.pdf

Video Streaming: Then, Now, and in the Future

Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...

Climate Impact of Software Testing at Nordic Testing Days

“I’m still / I’m still / Chaining from the Block”

Removing Uninteresting Bytes in Software Fuzzing

20240607 QFM018 Elixir Reading List May 2024

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...

GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024

Pushing the limits of ePRTC: 100ns holdover for 100 days

TrustArc Webinar - 2024 Global Privacy Survey

Welcome to the Age of Data

1. Welcome to the age of data! BIGDATA.BE IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

2. who am i » Steven Noels » Founder & VP Product » Makers of Lily: Interactive Big Data platform » Open Source / Apache Software Foundation » co-founder bigdata.be IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 2

3. Houston, we have a problem. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

4. We’re drowning. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

5. Drowning in a Sea of Data. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

6. Mountains of Metadata. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

7. The firehose of UGC. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

8. Still, we can’t make much sense of it. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

9. ... and we throw a lot of it away. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

10. We regard DATA as cost. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

11. But data is an opportunity. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

12. Think about it. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

13. advertisements IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 13

14. recommendations IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 14

15. fraud detection IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 15

16. eyeballs IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 16

17. churn IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 17

18. The future is for datanerds. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

19. This is what Big Data is about: new insights, new business. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

20. 3 issues for BIG DATA IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

21. volume need: more capacity data moore 1 time IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 21

22. solution: distributed systems 1 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 22

23. 1 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 23

24. distributed systems are 1 hard. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

25. 2 database IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 25

26. 2 database data warehouse IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 25

27. 2 database data warehouse analytics IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 25

28. data shuffling, data duplication 2 database data warehouse analytics IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 25

29. “Top-performing organizations are twice as likely to apply analytics to activities.” 3 (MIT Sloan Management Review, Winter 2011) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 26

30. enter IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 27

31. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 28

32. HBase IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 28

33. what is hadoop ? 1 server RAM CPU Disk IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 29

34. RAM HBASE CPU MAP/REDUCE DISK HDFS many servers IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 30

35. map/reduce IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 31

36. map/reduce » Batch-oriented » Data locality (code is shipped around) » Heavy parallellization » Process management » Append-only ﬁles IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 32

37. Hadoop ecosystem » Hadoop Common » Hive: A data warehouse infrastructure » Subprojects that provides data summarization and ad hoc querying. » Flume/SQOOP: Data collection systems » MapReduce: A software framework for for large distributed systems. distributed processing of large data » HBase: A scalable, distributed database sets on compute clusters. that supports structured data storage » Pig: A high-level data-ﬂow language for large/wide tables. and execution framework for parallel » HDFS: A distributed ﬁle system that computation. provides high throughput access to » ZooKeeper: A high-performance application data. coordination service for distributed applications. » Mahout: machine learning libraries IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 33

38. High-level data model / easy API indexes UI Framework SDK (HUE) (HUE SDK) Search Dev2Dev Workflow Scheduling Metadata tutoring, (OOZIE) (oozie) (HIVE) integrated deployment and Languages / enterprise Data Compilers Fast usage metrics, support Integration (PIG, HIVE) Read/Write analytics & (FLUME, Access recommen- SQOOP) (HBASE) dations (PIG, HIVE) Coordination (ZOOKEEPER) CDH IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 34

39. real-time big data architecture 1. compensate for high latency of updates to serving layer speed layer 2. fast, incremental algorithms 3. batch layer eventually overrides speed layer storm 1. random access to batch views serving layer 2. updated by batch layer 1. store master dataset (append-only) batch layer 2. compute arbitrary views IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 35

40. Hadoop, interactive. Analytics Interactics (RDBMS) batch interactive static ﬁles data management 1018 1015 109-12 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 36

41. news & media smart data management insights indexing search commerce ﬁnance interactive audience proﬁle metrics harvesting telecom My baby: Lily. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 37

42. The start of Lily. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 38

43. Thank you ! for your attention for your questions » steven.noels@outerthought.com » @stevenn IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Welcome to the Age of Data

Recommended

Recommended

More Related Content

Similar to Welcome to the Age of Data

Similar to Welcome to the Age of Data (20)

More from NGDATA

More from NGDATA (6)

Recently uploaded

Recently uploaded (20)

Welcome to the Age of Data