This document discusses Big Data Europe (BDE), an open source big data platform. It provides an overview of BDE's goals, architecture, and applications. The key points are:
1) BDE's goals are to make it easy to install, develop for, deploy, and integrate big data applications. It aims to unlock the value of data through an open platform.
2) BDE supports a variety of frameworks and uses Docker to package components. Its architecture includes layers for resources, data, processing, and applications.
3) BDE is being applied to challenges in domains like health, transport, energy, and security. Examples analyze traffic patterns, perform predictive maintenance, and detect changes in infrastructure
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...BigData_Europe
Presentation at the Big Data Europe SC6 workshop #3 on 11.9.2017 in Amsterdam co-located with SEMANTiCS2017 conference: BDE PIlot Societal Challenge 6: CITIZEN BUDGET ON MUNICIPAL LEVEL by Martin Kaltenboeck (Semantic Web Company, SWC).
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...BigData_Europe
Slides of the keynote at the 3rd Big Data Europe SC6 Workshop co-located at SEMANTiCS2018 in Amsterdam (NL) on: The European Research Data Landscape: Opportunities for CESSDA by Peter Doorn, Director DANS, Chair, Science Europe W.G. on Research Data. Chair, CESSDA ERIC General Assembly
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...BigData_Europe
Presentation at the Big Data Europe SC6 workshop #3 on 11.9.2017 in Amsterdam co-located with SEMANTiCS2017 conference: BDE PIlot Societal Challenge 6: CITIZEN BUDGET ON MUNICIPAL LEVEL by Martin Kaltenboeck (Semantic Web Company, SWC).
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...BigData_Europe
Slides of the keynote at the 3rd Big Data Europe SC6 Workshop co-located at SEMANTiCS2018 in Amsterdam (NL) on: The European Research Data Landscape: Opportunities for CESSDA by Peter Doorn, Director DANS, Chair, Science Europe W.G. on Research Data. Chair, CESSDA ERIC General Assembly
Hajira Jabeen introduces the Big Data Europe Integrator Platform. The deck also includes the slides use to summarise the other presentations in the launch webinar.
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...BigData_Europe
Slides for keynote talk at the Big Data Europe workshop nr 3 on 11.9.2017 in Amsterdam co-located with SEMANTiCS2017 conference by Ron Dekker, Director CESSDA: European Open Science Agenda: where we are and where we are going?
BDE-SC6 Hangout - “Insight into Virtual Currency Ecosystems”BigData_Europe
Third SC6 webinar was held on 16 February 2017. It was organised by the Consortium of Social Science Data Archives (CESSDA) from Norway and the Semantic Web Company (SWC) from Austria. Theme of the webinar was “Insight into Virtual Currency Ecosystems” presented by Dr. Bernhard Haslhofer, Data Scientist at the Austrian Institute of Technology.
Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...BigData_Europe
H2020 BigDataEurope is a flagship project of the European Union's Horizon 2020 framework programme for research and innovation. In this talk we present the Docker-based BigDataEurope platform, which integrates a variety of Big Data processing components such as Hive, Cassandra, Apache Flink and Spark. Particularly supporting the variety dimension of Big Data, it adds a semantic data processing layer, which allows to ingest, map, transform and exploit semantically enriched data. In this talk, we will present the innovative technical architecture as well as applications of the BigDataEurope platform for life sciences (OpenPhacts), mobility, food & agriculture as well as industrial analytics (predictive maintenance). We demonstrate how societal value can be generated by Big Data analytics, e.g. making transportation networks more efficient or facilitating drug research.
Project Description of the Linked Open Data (LOD) PILOT Austria - presented at the PiLOD event at VU Amsterdam (Netherlands) on 29.01. 2014 (see: http://www.pilod.nl/) by Martin Kaltenböck of Semantic Web Company.
Easy SPARQLing for the Building Performance ProfessionalMartin Kaltenböck
Slides of Martin Kaltenböcks (SWC) presentation at SEMANTiCS2014 conference in Leipzig on 5th of September 2014 about the 'Tool for Building Energy Performance Scenarios' of GBPN (Global Buildings Performance Network, http://gbpn.org) that provides a prediction tool for buildings performance worldwide by making use of Linked Open Data (LOD).
Hajira Jabeen introduces the Big Data Europe Integrator Platform. The deck also includes the slides use to summarise the other presentations in the launch webinar.
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...BigData_Europe
Slides for keynote talk at the Big Data Europe workshop nr 3 on 11.9.2017 in Amsterdam co-located with SEMANTiCS2017 conference by Ron Dekker, Director CESSDA: European Open Science Agenda: where we are and where we are going?
BDE-SC6 Hangout - “Insight into Virtual Currency Ecosystems”BigData_Europe
Third SC6 webinar was held on 16 February 2017. It was organised by the Consortium of Social Science Data Archives (CESSDA) from Norway and the Semantic Web Company (SWC) from Austria. Theme of the webinar was “Insight into Virtual Currency Ecosystems” presented by Dr. Bernhard Haslhofer, Data Scientist at the Austrian Institute of Technology.
Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...BigData_Europe
H2020 BigDataEurope is a flagship project of the European Union's Horizon 2020 framework programme for research and innovation. In this talk we present the Docker-based BigDataEurope platform, which integrates a variety of Big Data processing components such as Hive, Cassandra, Apache Flink and Spark. Particularly supporting the variety dimension of Big Data, it adds a semantic data processing layer, which allows to ingest, map, transform and exploit semantically enriched data. In this talk, we will present the innovative technical architecture as well as applications of the BigDataEurope platform for life sciences (OpenPhacts), mobility, food & agriculture as well as industrial analytics (predictive maintenance). We demonstrate how societal value can be generated by Big Data analytics, e.g. making transportation networks more efficient or facilitating drug research.
Project Description of the Linked Open Data (LOD) PILOT Austria - presented at the PiLOD event at VU Amsterdam (Netherlands) on 29.01. 2014 (see: http://www.pilod.nl/) by Martin Kaltenböck of Semantic Web Company.
Easy SPARQLing for the Building Performance ProfessionalMartin Kaltenböck
Slides of Martin Kaltenböcks (SWC) presentation at SEMANTiCS2014 conference in Leipzig on 5th of September 2014 about the 'Tool for Building Energy Performance Scenarios' of GBPN (Global Buildings Performance Network, http://gbpn.org) that provides a prediction tool for buildings performance worldwide by making use of Linked Open Data (LOD).
Docker Bday #5, SF Edition: Introduction to DockerDocker, Inc.
In celebration of Docker's 5th birthday in March, user groups all around the world hosted birthday events with an introduction to Docker presentation and hands-on-labs. We invited Docker users to recognize where they were on their Docker journey and the goal was to help them take the next step of their journey with the help of mentors. This presentation was done at the beginning of the events (this one is from the San Francisco event in HQ) and gives a run down of the birthday event series, Docker's momentum, a basic explanation of containers, the benefits of using the Docker platform, Docker + Kubernetes and more.
PeopleSoft Cloud Architecture - OpenWorld 2016Graham Smith
Oracle’s PeopleSoft PeopleTools 8.55 saw the introduction of PeopleSoft’s cloud architecture: a platform and set of tools for solving many of the issues associated with effectively running PeopleSoft applications in the cloud. This session explores how you can take advantage of this exciting innovation in PeopleSoft, describes practical use cases for making PeopleSoft’s cloud architecture work for you, and discusses how Oracle Compute Cloud Service can play a key part in this.
This presentation was part of the Cloudify and XLAB Research Webinar about DevOps for Data Intensive Applications.
In this webinar we discussed how to leverage automation for your big data applications, using DICE tools based on the Cloudify Open Source Orchestration.
We want to make sure that developers use the time to develop their big data applications and not have to worry about deployment and operations, and have the shortest time to delivery possible.
We also cover using the DICE deployment tools for automated deployment of Spark, Storm, Cassandra or Hadoop.
Big Data Europe: Simplifying Development and Deployment of Big Data ApplicationsBigData_Europe
Presentation at MSD IT Global Innovation Center in Prague, Czech Republic. Covers the technical outcomes of horizon2020 BigDataEurope project and provides and example of a component integration into the BDI platform.
Bahrain ch9 introduction to docker 5th birthday Walid Shaari
A hands-on workshop will go over the foundations of the containers platform, including an overview of the platform system components: images, containers, repositories, clustering, and orchestration. The strategy is to demonstrate through "live demo, and hands-on exercises." The reuse case of containers in building a portable distributed application cluster running a variety of workloads including HPC workload.
[OpenStack Day in Korea 2015] Keynote 2 - Leveraging OpenStack to Realize the...OpenStack Korea Community
OpenStack Day in Korea 2015 - Keynote 2
Leveraging OpenStack to Realize the SKT Software-Defined Data Center
Jinsung Choi, Ph.D - CTO, Corporate R&D Center, SK Telecom
Building Robotics Application at Scale using OpenSource from Zero to HeroAlex Barbosa Coqueiro
Today, organizations are using robotics to address a host of business challenges, from the self-driving car to autonomous walkers to assist older adults, exploring various environments from deep oceans to other planets like Mars. In the past, the integration of these robots took a significant amount of time and effort, and it required specialized expertise in this field. Still, this scenario has dramatically changed thanks to adopting a real-time production system with Linux and the Robot Operating System (ROS). ROS is an open-source software framework for robot development, including middleware, drivers, libraries, tools, and commonly used algorithms for robotics. In this session, we walk the audience through the steps from design to deployment robots using ROS2 Foxy (new version of ROS) from zero to hero using live demo using Python 3 (rclpy) with DDS (Data Distribution Service) simulating real-world environments with Gazebo (open-source 3D robotics simulator). In a nutshell, I will cover designing, developing, testing, and deploying intelligent robotics applications at scale, including integration with critical components, and discuss models that allow for optimized large fleet management.
Digital transformation is more than a buzz phrase. Learn how companies are evolving to Cloud, systematically leveraging existing workloads on their current platforms for competitive advantage. This session explores the transition to Cloud using Node.js technologies and unlocking the power of your existing data sets and what you can expect from the Node.js Foundation and community moving forward. See how the ability to start on known and familiar platforms and environments and to maintain a bridge to data on these platforms using new technologies like Node.js can be one of the keys to success of the move to cloud native.
Come learn about the work IBM is doing to ensure to that:
- Node.js is available across platforms and environments
- that key tools and capabilities are available (monitoring, post mortem investigation)
- you can leverage existing datasets in your cloud native applications using the IBM SDK for Node.js is based on the Node.js™ open source project. It provides a compatible solution for IBM Power™, Intel® and z Systems™ products that require Node.js functionality and package management.
Tampere Docker meetup - Happy 5th Birthday DockerSakari Hoisko
Part of official docker meetup events by Docker Inc.
https://events.docker.com/events/docker-bday-5/
Meetup event:
https://www.meetup.com/Docker-Tampere/events/248566945/
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshIanFurlong4
For organisations to successfully adopt data mesh, setting up and maintaining infrastructure needs to be easy.
We believe the best way to achieve this is to leverage the learnings from building a ‘central nervous system‘, commonly used in modern data-streaming ecosystems. This approach formalises and automates of the manual parts of building a data mesh.
This presentation introduces SpecMesh; a methodology and supporting developer toolkit to enable business to build the foundations of their data mesh.
State of GeoServer provides an update on our community and reviews the new and noteworthy features for the Project. The community keeps an aggressive six month release cycle with GeoServer 2.8 and 2.9 being released this year.
Each releases bring together exciting new features. This year a lot of work has been done on the user interface, clustering, security and compatibility with the latest Java platform. We will also take a look at community research into vector tiles, multi-resolution raster support and more.
Attend this talk for a cheerful update on what is happening with this popular OSGeo project. Whether you are an expert user, a developer, or simply curious what these projects can do for you, this talk is for you.
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...BigData_Europe
Where we are and are going for Big Data in OpenScience
Keynote talk at the Big Data Europe SC6 Workshop on 11.9.2017 in Amsterdam co-located with SEMANTiCS2017: The perspective of European official statistics by Fernando Reis, Task-Force Big Data, European Commission (Eurostat).
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...BigData_Europe
Options for Wind Farm performance assessment and Power forecasting (Mr. A. Kyritsis, ALTSOL/TERNA) at the BigDataEurope Workshop, Amsterdam, Novermber 2017.
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...BigData_Europe
Big Data Europe: Workshop 3 SC6 Social Science - 11.09.2017 in Amsterdam, co-located with SEMANTiCS2017 titled: THE IMPORTANCE OF METADATA & BIG DATA IN OPEN SCIENCE. Slides by Ivana Versic (Cessda) and Martin Kaltenböck (SWC)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BigData_Europe
Overview of Open PHACTS, the BDE Pilot project in SC1, presented at BDE SC1 Workshop 3, 13 December, 2017.
https://www.big-data-europe.eu/the-final-big-data-europe-workshop/
BDE SC1 Workshop 3 - Big Data Europe (Simon Scerri)BigData_Europe
Overview of the Big Data Europe project presented at BDE SC1 Workshop 3, 13 December, 2017.
https://www.big-data-europe.eu/the-final-big-data-europe-workshop/
SC1 Hangout: Updating public databases: Automation and other challenges for c...BigData_Europe
A recording of this webinar can be found at https://youtu.be/IqG3j5b-CXQ
Keeping databases up-to-date is a significant challenge with the rate at which many data sources are growing. Open PHACTS and Big Data Europe organised this webinar to hold an open, informal discussion around keeping databases updated – from user needs, to the challenges of automation, to potential technical approaches underpinning key data sources.
Joining our panel are Dr Evan Bolton, who manages the PubChem project at NCBI, and Professor Chris Evelo, Co-Founder and Director at WikiPathways.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
30. Platform installation
◎ Manual installation guide
◎ Using Docker Machine
o On local machine (VirtualBox)
o In cloud (AWS, DigitalOcean, Azure)
o Bare metal
◎ Screencasts
30
33. ◎ High level picture
o docker-compose.yml describes pipeline topology
◎ BDE provided components
o extend template image with your code
◎ New components
o build a Docker image for your component
o this is your own little Virtual Machine for your component
◎ Sharing
o publish topology as git repository
o publish new components on docker hub
Platform development
34. Development
◎ Base Docker images
o Serve as a template for a (Big Data) technology
o Easily extendable custom algorithm/data
◎ Published components
o Image repositories on GitHub
o Automated builds on DockerHub
o Documentation on BDE Wiki
34
38. Enhancing the Component
◎ Orchestrator required for initialization process
(init_daemon)
o Components may depend on each other
o Components may require manual intervention
◎ User Interface Integration
o Standard Interfaces from components
o Combine and align the interfaces
38
40. Deploying a Big Data Stack
◎ Stack
o collection of communicating components
o to solve a specific problem
◎ Described in Docker Compose
o Component configuration
o Application topology
40
48. Beyond the state of the art ...
Smart Big Data
Increase the value of Big Data
by adding meaning to it!
48
49. Semantic Data Lake (Ontario)
◎ Data Swamp
o Repository of data in its raw format
o Structured, semi-structured, unstructured
o Schema-less
◎ Data Lake
o Add a Semantic layer on top of the source
datasets
o The data is semantically lifted using existing
ontology terms
49
54. BDE vs Hadoop distributions
Hortonworks Cloudera MapR Bigtop BDE
File System HDFS HDFS NFS HDFS HDFS
Installation Native Native Native Native lightweight
virtualization
Plug & play components (no
rigid schema)
no no no no yes
High Availability Single failure
recovery (yarn)
Single failure
recovery (yarn)
Self healing, mult.
failure rec.
Single failure
recovery (yarn)
Multiple Failure
recovery
Cost Commercial Commercial Commercial Free Free
Scaling Freemium Freemium Freemium Free Free
Addition of custom
components
Not easy No No No Yes
Integration testing yes yes yes yes --
Operating systems Linux Linux Linux Linux All
Management tool Ambari Cloudera manager MapR Control
system
- Docker swarm UI+
Custom
54
55. BDE vs Hadoop distributions
◎ BDE is not built on top of existing distributions
◎ Targets
o Communities
o Research institutions
◎ Bridges scientists and open data
◎ Multi Tier research efforts towards Smart
Data
55