Presentatie van Maurice Bouwhuis (SARA/Vancis): ‘Hoe big data te begrijpen door ze te visualiseren’ tijdens het Big Data Analytics seminar 14 juni in Almere
Crowdsourcing Approaches to Big Data Curation for Earth SciencesEdward Curry
The document discusses crowdsourcing approaches to data curation for earth sciences. It covers several topics including motivation for data curation, data quality and curation processes, crowdsourcing, case studies on crowdsourced data curation, and setting up a crowdsourced data curation process. Specifically, it describes challenges with data quality, defines data curation and the role of data curators. It also outlines different types of data curation approaches based on who performs the curation (individual curators, departments, communities) and how it is done (manually, automated, crowdsourced).
The document discusses how big data is driving the need for new database technologies that can handle large, unstructured datasets and provide real-time analytics capabilities that traditional relational databases cannot support. It outlines the limitations of relational databases for big data and analyzes emerging technologies like Hadoop, NoSQL databases, and cloud computing that are better suited for storing, processing, and analyzing large volumes of diverse data types. The document also examines the infrastructure, architectural, and market requirements for big data platforms and products.
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupEdward Curry
Data management efforts such as Master Data Management and Data Curation are a popular approach for high quality enterprise data. However, Data Curation can be heavily centralised and labour intensive, where the cost and effort can become prohibitively high. The concentration of data management and stewardship onto a few highly skilled individuals, like developers and data experts, can be a significant bottleneck. This talk explores how to effectively involving a wider community of users within big data management activities. The bottom-up approach of involving crowds in the creation and management of data has been demonstrated by projects like Freebase, Wikipedia, and DBpedia. The talk discusses how crowdsourcing data management techniques can be applied within an enterprise context.
Topics covered include:
- Data Quality And Data Curation
- Crowdsourcing
- Case Studies on Crowdsourced Data Curation
- Setting up a Crowdsourced Data Curation Process
- Linked Open Data Example
- Future Research Challenges
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Edward Curry
Cyber-Physical Energy Systems (CPES) exploit the potential of information technology to boost energy efficiency while minimising environmental impacts. CPES can help manage energy more efficiently by providing a functional view of the entire energy system so that energy activities can be understood, changed, and reinvented to better support sustainable practices. CPES can be applied at different scales from Smart Grids and Smart Cities to Smart Enterprises and Smart Buildings. Significant technical challenges exist in terms of information management, leveraging real-time sensor data, coordination of the various stakeholders to optimize energy usage.
In this talk I describe an approach to overcome these challenges by re-using the Web standards to quickly connect the required systems within a CPES. The resulting lightweight architecture leverages Web technologies including Linked Data, the Web of Things, and Social Media. The paper describes the fundamentals of the approach and demonstrates it within an Enterprise Energy Management scenario smart building.
The document discusses Jongwook Woo and his background working with big data. It provides details on Woo's experience as a professor focusing on big data research and education partnerships. It also outlines some of the topics Woo covers in his presentations including introductions to big data, artificial intelligence, and the relationship between AI and big data. Key technologies like Hadoop, Spark, and neural networks are mentioned.
Big Data is growing rapidly in terms of volume, variety, and velocity. The cloud is well-suited to handle Big Data challenges by providing elastic and scalable infrastructure, which optimizes resources and reduces costs compared to traditional IT. In the cloud, users can collect, store, analyze and share large amounts of data without upfront investment, and scale easily as needs change. Real-world examples show how companies in industries like banking, retail, and advertising are using the cloud's Big Data services to gain insights from large datasets.
SLUA: Towards Semantic Linking of Users with Actions in CrowdsourcingEdward Curry
Recent advances in web technologies allow people to help solve complex problems by performing online tasks in return for money, learning, or fun. At present, human contribution is limited to the tasks defined on individual crowdsourcing platforms. Furthermore, there is a lack of tools and technologies that support matching of tasks with appropriate users, across multiple systems. A more explicit capture of the semantics of crowdsourcing tasks could enable the design and development of matchmaking services between users and tasks. The paper presents the SLUA ontology that aims to model users and tasks in crowdsourcing systems in terms of the relevant actions, capabilities, and rewards. This model describes different types of human tasks that help in solving complex problems using crowds. The paper provides examples of describing users and tasks in some real world systems, with SLUA ontology.
Crowdsourcing Approaches to Big Data Curation for Earth SciencesEdward Curry
The document discusses crowdsourcing approaches to data curation for earth sciences. It covers several topics including motivation for data curation, data quality and curation processes, crowdsourcing, case studies on crowdsourced data curation, and setting up a crowdsourced data curation process. Specifically, it describes challenges with data quality, defines data curation and the role of data curators. It also outlines different types of data curation approaches based on who performs the curation (individual curators, departments, communities) and how it is done (manually, automated, crowdsourced).
The document discusses how big data is driving the need for new database technologies that can handle large, unstructured datasets and provide real-time analytics capabilities that traditional relational databases cannot support. It outlines the limitations of relational databases for big data and analyzes emerging technologies like Hadoop, NoSQL databases, and cloud computing that are better suited for storing, processing, and analyzing large volumes of diverse data types. The document also examines the infrastructure, architectural, and market requirements for big data platforms and products.
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupEdward Curry
Data management efforts such as Master Data Management and Data Curation are a popular approach for high quality enterprise data. However, Data Curation can be heavily centralised and labour intensive, where the cost and effort can become prohibitively high. The concentration of data management and stewardship onto a few highly skilled individuals, like developers and data experts, can be a significant bottleneck. This talk explores how to effectively involving a wider community of users within big data management activities. The bottom-up approach of involving crowds in the creation and management of data has been demonstrated by projects like Freebase, Wikipedia, and DBpedia. The talk discusses how crowdsourcing data management techniques can be applied within an enterprise context.
Topics covered include:
- Data Quality And Data Curation
- Crowdsourcing
- Case Studies on Crowdsourced Data Curation
- Setting up a Crowdsourced Data Curation Process
- Linked Open Data Example
- Future Research Challenges
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Edward Curry
Cyber-Physical Energy Systems (CPES) exploit the potential of information technology to boost energy efficiency while minimising environmental impacts. CPES can help manage energy more efficiently by providing a functional view of the entire energy system so that energy activities can be understood, changed, and reinvented to better support sustainable practices. CPES can be applied at different scales from Smart Grids and Smart Cities to Smart Enterprises and Smart Buildings. Significant technical challenges exist in terms of information management, leveraging real-time sensor data, coordination of the various stakeholders to optimize energy usage.
In this talk I describe an approach to overcome these challenges by re-using the Web standards to quickly connect the required systems within a CPES. The resulting lightweight architecture leverages Web technologies including Linked Data, the Web of Things, and Social Media. The paper describes the fundamentals of the approach and demonstrates it within an Enterprise Energy Management scenario smart building.
The document discusses Jongwook Woo and his background working with big data. It provides details on Woo's experience as a professor focusing on big data research and education partnerships. It also outlines some of the topics Woo covers in his presentations including introductions to big data, artificial intelligence, and the relationship between AI and big data. Key technologies like Hadoop, Spark, and neural networks are mentioned.
Big Data is growing rapidly in terms of volume, variety, and velocity. The cloud is well-suited to handle Big Data challenges by providing elastic and scalable infrastructure, which optimizes resources and reduces costs compared to traditional IT. In the cloud, users can collect, store, analyze and share large amounts of data without upfront investment, and scale easily as needs change. Real-world examples show how companies in industries like banking, retail, and advertising are using the cloud's Big Data services to gain insights from large datasets.
SLUA: Towards Semantic Linking of Users with Actions in CrowdsourcingEdward Curry
Recent advances in web technologies allow people to help solve complex problems by performing online tasks in return for money, learning, or fun. At present, human contribution is limited to the tasks defined on individual crowdsourcing platforms. Furthermore, there is a lack of tools and technologies that support matching of tasks with appropriate users, across multiple systems. A more explicit capture of the semantics of crowdsourcing tasks could enable the design and development of matchmaking services between users and tasks. The paper presents the SLUA ontology that aims to model users and tasks in crowdsourcing systems in terms of the relevant actions, capabilities, and rewards. This model describes different types of human tasks that help in solving complex problems using crowds. The paper provides examples of describing users and tasks in some real world systems, with SLUA ontology.
Querying Heterogeneous Datasets on the Linked Data WebEdward Curry
The growing number of datasets published on the Web as linked data brings both opportunities for high data availability and challenges inherent to querying data in a semantically heterogeneous and distributed environment. Approaches used for querying siloed databases fail at Web-scale because users don't have an a priori understanding of all the available datasets. This article investigates the main challenges in constructing a query and search solution for linked data and analyzes existing approaches and trends.
Viet-Trung Tran presents information on big data and cloud computing. The document discusses key concepts like what constitutes big data, popular big data management systems like Hadoop and NoSQL databases, and how cloud computing can enable big data processing by providing scalable infrastructure. Some benefits of running big data analytics on the cloud include cost reduction, rapid provisioning, and flexibility/scalability. However, big data may not always be suitable for the cloud due to issues like data security, latency requirements, and multi-tenancy overhead.
This document discusses high performance analytics and summarizes key capabilities of SAS Visual Analytics including easy analytics, visualizations for any skill level, calculated measures, automatic forecasting, and saved report packages. It also provides examples of public data sources that can be analyzed in SAS Visual Analytics including agricultural production and pricing data from India.
Interactive Water Services: The Waternomics ApproachEdward Curry
The document describes the Waternomics project, which aims to introduce demand response and accountability principles in the water sector through interactive water services. The project will develop a water information platform and tools to provide personalized water consumption and availability data to households, companies and cities. It will implement pilots in Greece, Italy and Ireland to test applications like water dashboards, prediction tools, simulations and games to increase user awareness and encourage behavioral changes. The platform uses linked open data, Internet of Things sensors and semantic technologies to integrate scattered water data sources and address challenges of data interoperability across domains.
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse...Dataconomy Media
"Industrializing Machine Learning – How to Integrate ML in Existing Businesses", Erik Schmiegelow, CEO at Hivemind Technologies AG
Watch more from Data Natives Berlin 2016 here: http://bit.ly/2fE1sEo
Visit the conference website to learn more: www.datanatives.io
Follow Data Natives:
https://www.facebook.com/DataNatives
https://twitter.com/DataNativesConf
Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2017: http://bit.ly/1WMJAqS
About the Author:
Since 1996, Erik Schmiegelow has worked as a software architecht and consultant, building large data processing platforms for companies such as NTT DoCoMo, Royal Mail, Siemens, E-Plus, Allianz and T-Mobile; and until 2001 he was CTO at the Cologne-based digital agency denkwerk.
In 2007 he founded the telecommunications consulting agency Itellity, followed by Hivemind Technologies in 2014. Hivemind Technologies is a solutions and services company, focussed on big data analytics and stream processing technologies for web, social data and industrial applications. Erik studied computer sciences in Hamburg.
Data Scientists: Your Must-Have Business InvestmentKalido
This document summarizes a presentation on data science and the role of data scientists. It discusses how data science has evolved from earlier fields like statistics and data mining. It also profiles common skills of data scientists like data integration, programming, analytics, and communication. Additionally, the presentation outlines how data science differs from traditional business intelligence by focusing more on prediction and interacting with large, unstructured datasets in real-time. The document promotes data science as a key business investment and announces an upcoming summer webinar series on related topics.
A high level overview of common Cassandra use cases, adoption reasons, BigData trends, DataStax Enterprise and the future of BigData given at the 7th Advanced Computing Conference in Seoul, South Korea
Wikipedia (DBpedia): Crowdsourced Data CurationEdward Curry
Wikipedia is an open-source encyclopedia, built collaboratively by a large community of web editors. The success of Wikipedia as one of the most important sources of information available today still challenges existing models of content creation. Despite the fact that the term ‘curation’ is not commonly addressed by Wikipedia’s contributors, the task of digital curation is the central activity of Wikipedia editors, who have the responsibility for information quality standards.
Wikipedia, is already widely used as a collaborative environment inside organizations5.
The investigation of the collaboration dynamics behind Wikipedia highlights important features and good practices which can be applied to different organizations. Our analysis focuses on the curation perspective and covers two important dimensions: social organization and artifacts, tools & processes for cooperative work coordination. These are key enablers that support the creation of high quality information products in Wikipedia’s decentralized environment.
The document discusses big data and the open source big data stack. It defines big data as large datasets that are difficult to store, manage and analyze. Everyday, 2.5 trillion bytes of data are created, with 90% created in the last two years. The open source big data stack includes tools like Hadoop, HBase, Hive and Pig that can handle large datasets through distributed computing across multiple servers. The stack provides flexibility, reliability, auditability and fast deployment at low cost compared to proprietary solutions.
Everything Has Changed Except Us: Modernizing the Data Warehousemark madsen
This document discusses modernizing data warehouse architecture to handle changes in data and analytics needs. It argues that the traditional data warehouse approach of fully modeling data before use is untenable with today's data volumes and rates of change. Instead, it advocates for a layered architecture that separates data acquisition, management, and delivery into independent but coordinated systems. This allows each layer and component to change at its own pace and focuses on data access and usability rather than strict control and governance. The goal is to design systems that can adapt to changes in data and analytics uses over time rather than trying to plan and control everything up front.
The New York Times is the largest metropolitan and the third largest newspaper in the United States. The Times website, nytimes.com, is ranked as the most
popular newspaper website in the United States and is an important source of advertisement revenue for the company. The NYT has a rich history for curation of its articles and its 100 year old curated repository has ultimately defined its participation as one of the first players in the emergingWeb of Data.
Data curation is a process that can ensure the quality of data and its fitness for use. Traditional approaches to curation are struggling with increased data volumes, and near real-time demands for curated data. In response, curation teams have turned to community crowd-sourcing and semi-automatedmetadata tools for assistance.
E. Curry, A. Freitas, and S. O’Riáin, “The Role of Community-Driven Data Curation for Enterprises,” in Linking Enterprise Data, D. Wood, Ed. Boston, MA: Springer US, 2010, pp. 25-47.
Developing an Sustainable IT Capability: Lessons From Intel's JourneyEdward Curry
Intel Corporation set itself a goal to reduce its global-warming greenhouse gas footprint by 20% by 2012 from 2007 levels. Through the use of sustainable IT, the Intel IT organization is recognized as a significant contributor to the company’s sustainability strategy by transforming its IT operations and overall Intel operations. This article describes how Intel has achieved IT sustainability benefits thus far by developing four key capabilities. These capabilities have been incorporated into the Sustainable ICT Capability Maturity Framework (SICT-CMF), a model developed by an industry consortium in which the authors were key participants. The article ends with lessons learned from Intel’s experiences that can be applied by business and IT executives in other enterprises.
This document provides an overview of big data analytics, strategies, and the WSO2 big data platform. It discusses how the amount of data in the world is growing exponentially due to factors like increased data collection and the internet of things. It then summarizes the WSO2 big data platform for collecting, processing, analyzing and visualizing large datasets. Key components include the complex event processor for query processing and the business activity monitor for dashboards. The document concludes by outlining new developments and features being worked on, such as distributed complex event processing and machine learning integration.
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012Preferred Networks
Currently, we face new challenges in realtime analytics of BigData, such as social monitoring, M2M sensor, online advertising optimization, smart energy management and security monitoring. To analyze these data, scalable machine learning technologies are essential. Jubatus is the open source platform for online distributed machine learning on the data streams of BigData. we explain the inside technologies of Jubatus and show how jubatus can achieve realtime analytics in various problems.
This document discusses how Oracle Data Integrator 12c (ODI12c) can bridge the gap between big data and enterprise data. It allows users to integrate both types of data through a single unified tool. Key features include application adapters for Hadoop that enable native Hadoop integration, loading and transforming of big data, and integrated platforms and real-time analytics to simplify, optimize, and extend the value of big data.
How Data Virtualization Puts Machine Learning into Production (APAC)Denodo
Watch full webinar here: https://bit.ly/3mJJ4w9
Advanced data science techniques, like machine learning, have proven an extremely useful tool to derive valuable insights from existing data. Platforms like Spark, and complex libraries for R, Python and Scala put advanced techniques at the fingertips of the data scientists. However, these data scientists spend most of their time looking for the right data and massaging it into a usable format. Data virtualization offers a new alternative to address these issues in a more efficient and agile way.
Attend this session to learn how companies can use data virtualization to:
- Create a logical architecture to make all enterprise data available for advanced analytics exercise
- Accelerate data acquisition and massaging, providing the data scientist with a powerful tool to complement their practice
- Integrate popular tools from the data science ecosystem: Spark, Python, Zeppelin, Jupyter, etc
This document discusses the evolution of cluster computing and resource management. It describes how:
1) Early clusters were single-purpose and used technologies like MapReduce. General purpose cluster OSes like YARN emerged to allow multiple applications on a cluster.
2) YARN improved on Hadoop by decoupling the programming model from resource management, allowing more flexibility and better performance/availability.
3) REEF aims to further improve frameworks by factoring out common functionalities around communication, configuration, and fault tolerance.
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
Cloud Computing Evolution
Why Cloud Computing needed?
Cloud Computing Models
Cloud Solutions
Cloud Jobs opportunities
Criteria for Big Data
Big Data challenges
Technologies to process Big Data- Hadoop
Hadoop History and Architecture
Hadoop Eco-System
Hadoop Real-time Use cases
Hadoop Job opportunities
Hadoop and SAP HANA integration
Summary
TechWise with Eric Kavanagh, Dr. Robin Bloor and Dr. Kirk Borne
Live Webcast on July 23, 2014
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=59d50a520542ee7ed00a0c38e8319b54
Analytical applications are everywhere these days, and for good reason. Organizations large and small are using analytics to better understand any aspect of their business: customers, processes, behaviors, even competitors. There are several critical success factors for using analytics effectively: 1) know which kind of apps make sense for your company; 2) figure out which data sets you can use, both internal and external; 3) determine optimal roles and responsibilities for your team; 4) identify where you need help, either by hiring new employees or using consultants 5) manage your program effectively over time.
Register for this episode of TechWise to learn from two of the most experienced analysts in the business: Dr. Robin Bloor, Chief Analyst of The Bloor Group, and Dr. Kirk Borne, Data Scientist, George Mason University. Each will provide their perspective on how companies can address each of the key success factors in building, refining and using analytics to improve their business. There will then be an extensive Q&A session in which attendees can ask detailed questions of our experts and get answers in real time. Registrants will also receive a consolidated deck of slides, not just from the main presenters, but also from a variety of software vendors who provide targeted solutions.
Visit InsideAnlaysis.com for more information.
Deutsche Telekom and T-Systems are large European telecommunications companies. Deutsche Telekom has revenue of $75 billion and over 230,000 employees, while T-Systems has revenue of $13 billion and over 52,000 employees providing data center, networking, and systems integration services. Hadoop is an open source platform that provides more cost effective storage, processing, and analysis of large amounts of structured and unstructured data compared to traditional data warehouse solutions. Hadoop can help companies gain value from all their data by allowing them to ask bigger questions.
Este documento describe dos estrategias pedagógicas utilizadas en una clase de orientación escolar: el aprendizaje basado en vivencias y el aprendizaje flexible. El aprendizaje basado en vivencias implica analizar situaciones prácticas para construir nuevos conocimientos, mientras que el aprendizaje flexible se adapta a las características individuales de cada estudiante. También se mencionan otras estrategias como el aprendizaje invertido y el uso de herramientas TIC. El objetivo general es lograr aprendizajes significativos en
Big Data @ CBS for Fontys students in EindhovenPiet J.H. Daas
This document summarizes the experiences of Statistics Netherlands with big data. It discusses two types of data - primary data from their own surveys and secondary data from other sources like administrative records and big data. It provides examples of exploratory big data studies conducted using road sensor data, mobile phone data, and social media data. It finds that combining IT skills with statistical methodology is important for working with big data. Skills in data science, machine learning, and extracting information from diverse sources like text and images are needed. The document also discusses lessons learned regarding the types of big data, accessing and analyzing large volumes of data, dealing with noisy and unstructured data, and moving beyond simple correlation.
Querying Heterogeneous Datasets on the Linked Data WebEdward Curry
The growing number of datasets published on the Web as linked data brings both opportunities for high data availability and challenges inherent to querying data in a semantically heterogeneous and distributed environment. Approaches used for querying siloed databases fail at Web-scale because users don't have an a priori understanding of all the available datasets. This article investigates the main challenges in constructing a query and search solution for linked data and analyzes existing approaches and trends.
Viet-Trung Tran presents information on big data and cloud computing. The document discusses key concepts like what constitutes big data, popular big data management systems like Hadoop and NoSQL databases, and how cloud computing can enable big data processing by providing scalable infrastructure. Some benefits of running big data analytics on the cloud include cost reduction, rapid provisioning, and flexibility/scalability. However, big data may not always be suitable for the cloud due to issues like data security, latency requirements, and multi-tenancy overhead.
This document discusses high performance analytics and summarizes key capabilities of SAS Visual Analytics including easy analytics, visualizations for any skill level, calculated measures, automatic forecasting, and saved report packages. It also provides examples of public data sources that can be analyzed in SAS Visual Analytics including agricultural production and pricing data from India.
Interactive Water Services: The Waternomics ApproachEdward Curry
The document describes the Waternomics project, which aims to introduce demand response and accountability principles in the water sector through interactive water services. The project will develop a water information platform and tools to provide personalized water consumption and availability data to households, companies and cities. It will implement pilots in Greece, Italy and Ireland to test applications like water dashboards, prediction tools, simulations and games to increase user awareness and encourage behavioral changes. The platform uses linked open data, Internet of Things sensors and semantic technologies to integrate scattered water data sources and address challenges of data interoperability across domains.
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse...Dataconomy Media
"Industrializing Machine Learning – How to Integrate ML in Existing Businesses", Erik Schmiegelow, CEO at Hivemind Technologies AG
Watch more from Data Natives Berlin 2016 here: http://bit.ly/2fE1sEo
Visit the conference website to learn more: www.datanatives.io
Follow Data Natives:
https://www.facebook.com/DataNatives
https://twitter.com/DataNativesConf
Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2017: http://bit.ly/1WMJAqS
About the Author:
Since 1996, Erik Schmiegelow has worked as a software architecht and consultant, building large data processing platforms for companies such as NTT DoCoMo, Royal Mail, Siemens, E-Plus, Allianz and T-Mobile; and until 2001 he was CTO at the Cologne-based digital agency denkwerk.
In 2007 he founded the telecommunications consulting agency Itellity, followed by Hivemind Technologies in 2014. Hivemind Technologies is a solutions and services company, focussed on big data analytics and stream processing technologies for web, social data and industrial applications. Erik studied computer sciences in Hamburg.
Data Scientists: Your Must-Have Business InvestmentKalido
This document summarizes a presentation on data science and the role of data scientists. It discusses how data science has evolved from earlier fields like statistics and data mining. It also profiles common skills of data scientists like data integration, programming, analytics, and communication. Additionally, the presentation outlines how data science differs from traditional business intelligence by focusing more on prediction and interacting with large, unstructured datasets in real-time. The document promotes data science as a key business investment and announces an upcoming summer webinar series on related topics.
A high level overview of common Cassandra use cases, adoption reasons, BigData trends, DataStax Enterprise and the future of BigData given at the 7th Advanced Computing Conference in Seoul, South Korea
Wikipedia (DBpedia): Crowdsourced Data CurationEdward Curry
Wikipedia is an open-source encyclopedia, built collaboratively by a large community of web editors. The success of Wikipedia as one of the most important sources of information available today still challenges existing models of content creation. Despite the fact that the term ‘curation’ is not commonly addressed by Wikipedia’s contributors, the task of digital curation is the central activity of Wikipedia editors, who have the responsibility for information quality standards.
Wikipedia, is already widely used as a collaborative environment inside organizations5.
The investigation of the collaboration dynamics behind Wikipedia highlights important features and good practices which can be applied to different organizations. Our analysis focuses on the curation perspective and covers two important dimensions: social organization and artifacts, tools & processes for cooperative work coordination. These are key enablers that support the creation of high quality information products in Wikipedia’s decentralized environment.
The document discusses big data and the open source big data stack. It defines big data as large datasets that are difficult to store, manage and analyze. Everyday, 2.5 trillion bytes of data are created, with 90% created in the last two years. The open source big data stack includes tools like Hadoop, HBase, Hive and Pig that can handle large datasets through distributed computing across multiple servers. The stack provides flexibility, reliability, auditability and fast deployment at low cost compared to proprietary solutions.
Everything Has Changed Except Us: Modernizing the Data Warehousemark madsen
This document discusses modernizing data warehouse architecture to handle changes in data and analytics needs. It argues that the traditional data warehouse approach of fully modeling data before use is untenable with today's data volumes and rates of change. Instead, it advocates for a layered architecture that separates data acquisition, management, and delivery into independent but coordinated systems. This allows each layer and component to change at its own pace and focuses on data access and usability rather than strict control and governance. The goal is to design systems that can adapt to changes in data and analytics uses over time rather than trying to plan and control everything up front.
The New York Times is the largest metropolitan and the third largest newspaper in the United States. The Times website, nytimes.com, is ranked as the most
popular newspaper website in the United States and is an important source of advertisement revenue for the company. The NYT has a rich history for curation of its articles and its 100 year old curated repository has ultimately defined its participation as one of the first players in the emergingWeb of Data.
Data curation is a process that can ensure the quality of data and its fitness for use. Traditional approaches to curation are struggling with increased data volumes, and near real-time demands for curated data. In response, curation teams have turned to community crowd-sourcing and semi-automatedmetadata tools for assistance.
E. Curry, A. Freitas, and S. O’Riáin, “The Role of Community-Driven Data Curation for Enterprises,” in Linking Enterprise Data, D. Wood, Ed. Boston, MA: Springer US, 2010, pp. 25-47.
Developing an Sustainable IT Capability: Lessons From Intel's JourneyEdward Curry
Intel Corporation set itself a goal to reduce its global-warming greenhouse gas footprint by 20% by 2012 from 2007 levels. Through the use of sustainable IT, the Intel IT organization is recognized as a significant contributor to the company’s sustainability strategy by transforming its IT operations and overall Intel operations. This article describes how Intel has achieved IT sustainability benefits thus far by developing four key capabilities. These capabilities have been incorporated into the Sustainable ICT Capability Maturity Framework (SICT-CMF), a model developed by an industry consortium in which the authors were key participants. The article ends with lessons learned from Intel’s experiences that can be applied by business and IT executives in other enterprises.
This document provides an overview of big data analytics, strategies, and the WSO2 big data platform. It discusses how the amount of data in the world is growing exponentially due to factors like increased data collection and the internet of things. It then summarizes the WSO2 big data platform for collecting, processing, analyzing and visualizing large datasets. Key components include the complex event processor for query processing and the business activity monitor for dashboards. The document concludes by outlining new developments and features being worked on, such as distributed complex event processing and machine learning integration.
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012Preferred Networks
Currently, we face new challenges in realtime analytics of BigData, such as social monitoring, M2M sensor, online advertising optimization, smart energy management and security monitoring. To analyze these data, scalable machine learning technologies are essential. Jubatus is the open source platform for online distributed machine learning on the data streams of BigData. we explain the inside technologies of Jubatus and show how jubatus can achieve realtime analytics in various problems.
This document discusses how Oracle Data Integrator 12c (ODI12c) can bridge the gap between big data and enterprise data. It allows users to integrate both types of data through a single unified tool. Key features include application adapters for Hadoop that enable native Hadoop integration, loading and transforming of big data, and integrated platforms and real-time analytics to simplify, optimize, and extend the value of big data.
How Data Virtualization Puts Machine Learning into Production (APAC)Denodo
Watch full webinar here: https://bit.ly/3mJJ4w9
Advanced data science techniques, like machine learning, have proven an extremely useful tool to derive valuable insights from existing data. Platforms like Spark, and complex libraries for R, Python and Scala put advanced techniques at the fingertips of the data scientists. However, these data scientists spend most of their time looking for the right data and massaging it into a usable format. Data virtualization offers a new alternative to address these issues in a more efficient and agile way.
Attend this session to learn how companies can use data virtualization to:
- Create a logical architecture to make all enterprise data available for advanced analytics exercise
- Accelerate data acquisition and massaging, providing the data scientist with a powerful tool to complement their practice
- Integrate popular tools from the data science ecosystem: Spark, Python, Zeppelin, Jupyter, etc
This document discusses the evolution of cluster computing and resource management. It describes how:
1) Early clusters were single-purpose and used technologies like MapReduce. General purpose cluster OSes like YARN emerged to allow multiple applications on a cluster.
2) YARN improved on Hadoop by decoupling the programming model from resource management, allowing more flexibility and better performance/availability.
3) REEF aims to further improve frameworks by factoring out common functionalities around communication, configuration, and fault tolerance.
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
Cloud Computing Evolution
Why Cloud Computing needed?
Cloud Computing Models
Cloud Solutions
Cloud Jobs opportunities
Criteria for Big Data
Big Data challenges
Technologies to process Big Data- Hadoop
Hadoop History and Architecture
Hadoop Eco-System
Hadoop Real-time Use cases
Hadoop Job opportunities
Hadoop and SAP HANA integration
Summary
TechWise with Eric Kavanagh, Dr. Robin Bloor and Dr. Kirk Borne
Live Webcast on July 23, 2014
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=59d50a520542ee7ed00a0c38e8319b54
Analytical applications are everywhere these days, and for good reason. Organizations large and small are using analytics to better understand any aspect of their business: customers, processes, behaviors, even competitors. There are several critical success factors for using analytics effectively: 1) know which kind of apps make sense for your company; 2) figure out which data sets you can use, both internal and external; 3) determine optimal roles and responsibilities for your team; 4) identify where you need help, either by hiring new employees or using consultants 5) manage your program effectively over time.
Register for this episode of TechWise to learn from two of the most experienced analysts in the business: Dr. Robin Bloor, Chief Analyst of The Bloor Group, and Dr. Kirk Borne, Data Scientist, George Mason University. Each will provide their perspective on how companies can address each of the key success factors in building, refining and using analytics to improve their business. There will then be an extensive Q&A session in which attendees can ask detailed questions of our experts and get answers in real time. Registrants will also receive a consolidated deck of slides, not just from the main presenters, but also from a variety of software vendors who provide targeted solutions.
Visit InsideAnlaysis.com for more information.
Deutsche Telekom and T-Systems are large European telecommunications companies. Deutsche Telekom has revenue of $75 billion and over 230,000 employees, while T-Systems has revenue of $13 billion and over 52,000 employees providing data center, networking, and systems integration services. Hadoop is an open source platform that provides more cost effective storage, processing, and analysis of large amounts of structured and unstructured data compared to traditional data warehouse solutions. Hadoop can help companies gain value from all their data by allowing them to ask bigger questions.
Este documento describe dos estrategias pedagógicas utilizadas en una clase de orientación escolar: el aprendizaje basado en vivencias y el aprendizaje flexible. El aprendizaje basado en vivencias implica analizar situaciones prácticas para construir nuevos conocimientos, mientras que el aprendizaje flexible se adapta a las características individuales de cada estudiante. También se mencionan otras estrategias como el aprendizaje invertido y el uso de herramientas TIC. El objetivo general es lograr aprendizajes significativos en
Big Data @ CBS for Fontys students in EindhovenPiet J.H. Daas
This document summarizes the experiences of Statistics Netherlands with big data. It discusses two types of data - primary data from their own surveys and secondary data from other sources like administrative records and big data. It provides examples of exploratory big data studies conducted using road sensor data, mobile phone data, and social media data. It finds that combining IT skills with statistical methodology is important for working with big data. Skills in data science, machine learning, and extracting information from diverse sources like text and images are needed. The document also discusses lessons learned regarding the types of big data, accessing and analyzing large volumes of data, dealing with noisy and unstructured data, and moving beyond simple correlation.
Chapter11 International Finance ManagementPiyush Gaur
International banks provide various services to facilitate international trade and foreign exchange transactions for their clients. These services include trade financing, foreign exchange, hedging currency risk, and consulting. International banks have different types of offices depending on the country's regulations, including correspondent banks, representative offices, branches, subsidiaries, affiliates, and offshore centers. The international debt crisis of the 1980s was caused by developing countries taking on large debts from international banks that they struggled to repay when oil prices collapsed. The crisis was eventually resolved through debt restructuring and new types of bonds.
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise causes chemical changes in the brain that may help protect against mental illness and improve symptoms.
El documento presenta las diferentes áreas curriculares y competencias que componen el currículo escolar. Incluye áreas como personal social, psicomotriz, comunicación, castellano como segunda lengua, descubrimiento del mundo, matemática, ciencia y tecnología. También describe los fundamentos, principios de educación y psicopedagógicos del diseño curricular nacional, así como el proceso de diversificación curricular a nivel regional, local e institucional. Finalmente, presenta las características de un diseño curricular nacional diversificable como flexible
USGFX is an award-winning Australian forex broker allowing clients to trade forex, CFDs and commodities safely and securely online. It also provides traders with the best market research, trading tools & signals and a structured educational program. USGFX is regulated by ASIC and is headquartered in Sydney.
https://www.usgfx.com/
Using Road Sensor Data for Official Statistics: towards a Big Data MethodologyPiet J.H. Daas
This document discusses using road sensor data for official statistics in the Netherlands. It describes challenges around dealing with large volumes of data, creating historical time series, and ensuring accuracy. A statistical process is outlined that cleans, transforms, selects, estimates from and frames the raw road sensor data, which records over 230 million vehicle counts per day. Key steps include selecting only necessary variables from valid data on main routes, putting daily records together, cleaning using recursive Bayesian estimation and a hidden Markov model, and estimating traffic indices from the cleaned data.
Dr. Piet Daas (CBS) - Statistiek en grote data bestandenAlmereDataCapital
Presentatie van Dr. Piet Daas (CBS): 'Statistiek en grote data bestanden' tijdens het Big Data Analytics seminar 14 juni van Almere DataCapital in Almere.
CPU Optimizations in the CERN Cloud - February 2016Belmiro Moreira
This document discusses CPU optimizations for virtual machines (VMs) running on the CERN Cloud using OpenStack. Initially, VMs saw around 20% lower performance than bare metal hosts due to virtualization overhead. Through various optimizations including disabling EPT, enabling NUMA awareness and huge page sizes, virtualization overhead was reduced to around 3-5% on VMs compared to bare metal, allowing high-performance computing workloads to run effectively on the CERN Cloud. While challenging to detect during testing, these optimizations were critical to deploying VMs for production HPC workloads with performance near that of physical hardware.
This document provides an overview of the letterpress printing process and discusses how two Toronto printing companies, Flash Reproductions and Anstey Book Binding, have maintained letterpress departments despite the decline of letterpress in commercial printing. Both companies find that letterpress allows them to produce high-quality, specialized work and that there is a growing interest in letterpress among younger designers and consumers interested in craftsmanship. The document also profiles a book artist, George Walker, who continues to use letterpress in his studio.
DanTech is a UK-based technology company that is expanding its operations to the US through a joint venture called DanEast Ltd. DanTech will invoice US dollar exports through DanEast and import Japanese yen costs. Changes in the GBP/USD and GBP/JPY exchange rates present foreign exchange risk to DanTech's profits. The treasury manager recommends hedging techniques like money markets, forwards/futures, and options to manage this risk.
Aptitude Software is a UK-based company that provides accounting software solutions for insurance, banking, and telecommunications companies. It has over 20 years of experience working with large global clients. The document discusses Aptitude Software's products which include an insurance-focused accounting solution, event-based accounting, and reconciliation features. It also provides examples of how Aptitude Software has helped clients by streamlining financial processes, integrating systems, and addressing regulatory requirements.
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIBig Data Week
Charles Cai has more than two decades of experience and track records of global transformational programme deliveries – from vision, evangelism to end-to-end execution in global investment banks, and energy trading companies, where he excels at designing and building innovative, large scale, Big Data systems in high volume low latency trading, global Energy Trading & Risk Management, and advanced temporal and geospatial predictive analytics, as Chief Front Office Technical Architect and Head of Data Science. He’s also a frequent speaker at Google Campus, Big Data Innovation Summit, Cloud World Forum, Data Science London, QCon London and MoD CIO Symposium etc, to promote knowledge and best practice sharing, with audience ranging from developers, data scientists, to CXO level senior executives from both IT and business background. He has in-depth knowledge and experience Scala, Python, C# / F#, C++, Node.js, Java, R, Haskell programming languages in Mobile, Desktop, Hadoop/Spark, Cloud IoT/MCU and BlockChain etc, and TOGAF9, EMC-DS, AWS CNE4 etc. certifications.
Thesis blending big data and cloud -epilepsy global data research and inform...Anup Singh
This document provides an overview and abstract for a thesis project aimed at building an epilepsy data research and information system leveraging big data and cloud computing. The system would create a global federated database of medical information and services to help doctors and neurosurgeons treat epilepsy patients worldwide. It would provide access to large datasets on patients to help with research and treatment decisions. The project would use technologies like Hadoop, HDFS, HBase, MapReduce running on cloud platforms to store and analyze the large amounts of structured and unstructured epilepsy data from various sources.
This is a talk about Big Data, focusing on its impact on all of us. It also encourages institution to take a close look on providing courses in this area.
Dealing with Semantic Heterogeneity in Real-Time InformationEdward Curry
The document discusses computational paradigms for large scale open environments. It describes how environments have shifted from small controlled ones to large open ones with thousands of data sources and schemas. This requires processing information as it flows in real-time from multiple distributed sources. The talk introduces the concept of Information Flow Processing, which processes information as it streams in without intermediate storage. Examples of domains where this paradigm can be applied are given like financial analytics, inventory management and environmental monitoring.
This document provides an introduction to big data. It defines big data as large and complex data sets that are difficult to process using traditional data management tools. It discusses the three V's of big data - volume, variety and velocity. Volume refers to the large scale of data. Variety means different data types. Velocity means the speed at which data is generated and processed. The document outlines topics that will be covered, including Hadoop, MapReduce, data mining techniques and graph databases. It provides examples of big data sources and challenges in capturing, analyzing and visualizing large and diverse data sets.
The document provides an overview of the Dublinked Technology Workshop held on December 15th, 2011. It includes presentations on transportation data, spatial web services, linked data, and semantic data description. Breakout sessions covered topics like data publishing, discovery, web services, and advanced functions. The workshop aimed to address challenges around sharing digital data between organizations and discussed technical requirements and tools to support open government data platforms.
The document discusses big data testing using the Hadoop platform. It describes how Hadoop, along with technologies like HDFS, MapReduce, YARN, Pig, and Spark, provides tools for efficiently storing, processing, and analyzing large volumes of structured and unstructured data distributed across clusters of machines. These technologies allow organizations to leverage big data to gain valuable insights by enabling parallel computation of massive datasets.
This document outlines the course content for a Big Data Analytics course. The course covers key concepts related to big data including Hadoop, MapReduce, HDFS, YARN, Pig, Hive, NoSQL databases and analytics tools. The 5 units cover introductions to big data and Hadoop, MapReduce and YARN, analyzing data with Pig and Hive, and NoSQL data management. Experiments related to big data are also listed.
Innovation med big data – chr. hansens erfaringerMicrosoft
Mange steder er Big Data stadig det nye og ukendte, der ikke har topprioritet hos IT, da ”vi ikke har store datamængder”. Men Big Data er meget mere end store datamængder. I Chr. Hansen A/S har Forskning og Udvikling (Innovation) afdelingen arbejdet med værdien af data og som resultat etableret et tværfagligt BioInformatik-program på Big Data teknologier fra Microsoft.
SC6 Workshop 1: What can big data do for you? BigData_Europe
Presentation by Sören Auer, Fraunhofer IAIS, Coordinator of Big Data Europe, at the first workshop of Societal Challlenge 6 in the BigDataEurope project, taking place in Luxembourg on 18 November 2015.
http://www.big-data-europe.eu/social-sciences/
Data-intensive bioinformatics on HPC and CloudOla Spjuth
The document discusses data-intensive bioinformatics and challenges with analyzing large genomic datasets on high performance computing (HPC) resources. It summarizes that storage is the biggest challenge as sequencing projects generate very large amounts of data and users do not clean up data. The strategies discussed to address this include assessing costs of storage and analysis upfront, limiting project lifetimes, moving to tiered storage, and improving efficiency. It also discusses using cloud computing resources through virtual clusters and containers to enable flexible, on-demand access and pay-per-use pricing models. Scientific workflows and microservices approaches are presented as ways to automate and orchestrate large-scale genomic analyses on distributed computing resources.
This document provides an introduction and overview of the INF2190 - Data Analytics course. It discusses the instructor, Attila Barta, details on where and when the course will take place. It then provides definitions and history of data analytics, discusses how the field has evolved with big data, and references enterprise data analytics architectures. It contrasts traditional vs. big data era data analytics approaches and tools. The objective of the course is described as providing students with the foundation to become data scientists.
This document discusses an introduction to data and big data. It defines data and differentiates it from information. It then discusses the volume, variety, and velocity characteristics of big data known as the 3Vs. The document contrasts big data with small data and notes the challenges of working with big data due to its scale. Finally, it briefly mentions some major players in big data like Google and Hadoop and lists some common tools used for big data.
The document discusses the convergence of IoT, big data, and cloud technologies. It describes how IoT generates large amounts of data with characteristics like velocity and volume that challenge traditional big data approaches. The cloud is presented as a way to provide scalable, distributed infrastructure for processing and managing IoT and big data. Two approaches for the convergence are described: a centralized approach that brings IoT data and functions into the cloud, and a distributed approach that leverages edge/fog computing to move cloud capabilities closer to devices and end users.
This document provides an introduction to big data, including definitions and key concepts. It discusses the evolution of computing systems and data storage. Big data is defined as large and complex data sets that are difficult to process using traditional methods due to the volume, variety, velocity, and veracity of the data. Examples of big data sources and applications are provided. Finally, different approaches for analyzing big data are described, including MapReduce, Hadoop, real-time analytics using databases, and complex event processing.
This document discusses big data mining. It defines big data as large volumes of structured and unstructured data that are difficult to process using traditional methods due to their size. It describes the characteristics of big data including volume, variety, velocity, variability, and complexity. It also discusses challenges of big data such as data location, volume, hardware resources, and privacy. Popular tools for big data mining include Hadoop, Apache S4, Storm, Apache Mahout, and MOA. Hadoop is an open source software framework that allows distributed processing of large datasets across clusters of computers. Common algorithms for big data mining operate at the model and knowledge levels to discover patterns and correlations across distributed data sources.
Big data is the term for any gathering of information sets, so expensive and complex, that it gets to be hard to process for utilizing customary information handling applications. The difficulties incorporate investigation, catch, duration, inquiry, sharing, stockpiling, Exchange, perception, and protection infringement. To reduce spot business patterns, anticipate diseases, conflict etc., we require bigger data sets when compared with the smaller data sets. Enormous information is hard to work with utilizing most social database administration frameworks and desktop measurements and perception bundles, needing rather enormously parallel programming running on tens, hundreds, or even a large number of servers. In this paper there was an observation on Hadoop architecture, different tools used for big data and its security issues.
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Tomasz Bednarz
Presented at the ACEMS workshop at QUT in February 2015.
Credits: whole project team (names listed in the first slide).
Approved by CSIRO to be shared externally.
Similar to Maurice Bouwhuis (SARA/Vancis) - Hoe big data te begrijpen door ze te visualiseren (20)
Slides van Karel Thönissen (Garabit). Beveiliging op het allerhoogste niveau: hoe beveilig ik staatsgeheimen?
Gepresenteerd tijdens Privacy, Identity & Security (PIDS) seminar van Almere DataCapital, zie www.almeredatacapital.nl.
Slides van Steven van der Linden (directeur Qforce). Welke organisatorische maatregelen moet ik als ziekenhuis nemen om verantwoord mijn data extern op te laten slaan?
Gepresenteerd tijdens Privacy, Identity & Security (PIDS) seminar van Almere DataCapital, zie www.almeredatacapital.nl.
Slides van Maarten Stultjens (Elephant Security). Hoe regel ik als ziekenhuis mijn autorisaties in een cloudomgeving?
Gepresenteerd tijdens Privacy, Identity & Security (PIDS) seminar van Almere DataCapital, zie www.almeredatacapital.nl.
Slides van Sampo Kellomäki (CTO Synergetics). Datagebruik via Trustplatform en Privacy by Design.
Gepresenteerd tijdens Privacy, Identity & Security (PIDS) seminar van Almere DataCapital, zie www.almeredatacapital.nl.
Slides van Peter Kits (ICT-advocaat Holland Van Gijzen). Relevante nieuwe ontwikkelingen in wet- en regelgeving rond privacy en security.
Gepresenteerd tijdens Privacy, Identity & Security (PIDS) seminar van Almere DataCapital, zie www.almeredatacapital.nl.
Prof. mr. Sijmons (Universiteit Utrecht) @ PIDS seminarAlmereDataCapital
Slides van Prof. mr. Jaap Sijmons (hoogleraar Gezondheidsrecht UU). Huidige wet- en regelgeving belemmert externe data/cloudopslag in de zorg niet! Van wie zijn die zorgdata nu eigenlijk?
Gepresenteerd tijdens Privacy, Identity & Security (PIDS) seminar van Almere DataCapital, zie www.almeredatacapital.nl.
Roland Haeve (Atos): 'Using the Cloud for Big Data Analytics'AlmereDataCapital
Presentatie van Roland Haeve (Atos): 'Using the Cloud for Big Data Analytics' tijdens het Big Data Analytics seminar 14 juni van Almere DataCapital in Almere.
Gerard Jansen (CEO Alan Turing Institute) - Alan Turing Institute: brengt dat...AlmereDataCapital
Presentatie van Gerard Jansen (CEO Alan Turing Institute) - ‘Alan Turing Institute: brengt data tot leven’ tijdens het Big Data Analytics seminar 14 juni in Almere
Bert Reijmerink (Genalice) - Hoe technologie bijdraagt aan een betere behande...AlmereDataCapital
Presentatie van Bert Reijmerink (Genalice) - 'Hoe technologie bijdraagt aan een betere behandeling van kankerpatiënten' tijdens het Big Data Analytics seminar 14 juni in Almere
Carlijn Nouwen (McKinsey) - Keynote: Big Data in de ZorgAlmereDataCapital
De presentatie van Carlijn Nouwen (McKinsey) tijdens de conferentie 'Big Data in de Zorg' van 23 november 2011 in Almere. Op deze conferentie werd het officiële startschot gegeven voor Almere DataCapital en de Dutch Health Hub.
Sjaak van der Pouw (Siemens Healthcare) - Beeldexplosie: de mogelijkheden van...AlmereDataCapital
De presentatie van Sjaak van der Pouw (Siemens Healthcare) tijdens de conferentie 'Big Data in de Zorg' van 23 november 2011 in Almere. Op deze conferentie werd het officiële startschot gegeven voor Almere DataCapital en de Dutch Health Hub.
De presentatie van Nicky Hekster (IBM) tijdens de conferentie 'Big Data in de Zorg' van 23 november 2011 in Almere. Op deze conferentie werd het officiële startschot gegeven voor Almere DataCapital en de Dutch Health Hub.
De presentatie van Freek Bomhof (TNO) tijdens de conferentie 'Big Data in de Zorg' van 23 november 2011 in Almere. Op deze conferentie werd het officiële startschot gegeven voor Almere DataCapital en de Dutch Health Hub.
Harro Stokman (Euvision) - Big Brother Watches Big DataAlmereDataCapital
De presentatie van Harro Stokman (Euvision) tijdens de conferentie 'Big Data in de Zorg' van 23 november 2011 in Almere. Op deze conferentie werd het officiële startschot gegeven voor Almere DataCapital en de Dutch Health Hub.
Arjan Hassing (Ernst & Young) - Kosten besparen op big data storageAlmereDataCapital
De presentatie van Arjan Hassing (Ernst & Young) tijdens de conferentie 'Big Data in de Zorg' van 23 november 2011 in Almere. Op deze conferentie werd het officiële startschot gegeven voor Almere DataCapital en de Dutch Health Hub.
Lex Pater (Flevoziekenhuis) - Slim omgaan met ziekenhuisdataAlmereDataCapital
De presentatie van Lex Pater (Flevoziekenhuis) tijdens de conferentie 'Big Data in de Zorg' van 23 november 2011 in Almere. Op deze conferentie werd het officiële startschot gegeven voor Almere DataCapital en de Dutch Health Hub.
Prof. Ard den Heeten (LRCB) - Brondata: kennis uit ruwe dataAlmereDataCapital
De presentatie van Prof. Ard den Heeten (LRCB) tijdens de conferentie 'Big Data in de Zorg' van 23 november 2011 in Almere. Op deze conferentie werd het officiële startschot gegeven voor Almere DataCapital en de Dutch Health Hub.
Peter Walgemoed (Carelliance) - Businessmodels for Big DataAlmereDataCapital
De presentatie van Peter Walgemoed (Carelliance) tijdens de conferentie 'Big Data in de Zorg' van 23 november 2011 in Almere. Op deze conferentie werd het officiële startschot gegeven voor Almere DataCapital en de Dutch Health Hub.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Maurice Bouwhuis (SARA/Vancis) - Hoe big data te begrijpen door ze te visualiseren
1. Big Data: Visualization
Dr. Maurice Bouwhuis
SARA National High Performance Computing Services
Big Data analytics– Almere 14-06-2012
2. About SARA: Our Mission
Is to Support Innovation
SARA national HPC Center has
about ~170 fte’s in 2 locations
(Amsterdam and Almere)
Offices in Amsterdam
The mission of SARA is 2-fold:
4. Supporting research in the Netherlands
[SARA BV for Science & Innovation]
5. Offering commercial high-end ICT Data Center in Amsterdam
services
[Vancis BV for adVANCed Ict Services]
SARA werkt nauw samen met SURF
Data Center in Almere
Big Data analytics– Almere 14-06-2012
3. Almere Big Data Hoofdstad
Big Data analytics– Almere 14-06-2012
4. Big Data: The Deluge
Big Data analytics– Almere 14-06-2012
5. What is Big Data?
“I cannot define it, but I know it when I see it”
My Big Bear now
My Big Bear then
Big Data analytics– Almere 14-06-2012
6. Wikipedia Defining Big Data
beyond Commonly used ICT
“Data sets whose size is beyond the ability of commonly
used tools to capture, manage, and process the data within
a tolerable elapsed time.
Big Data analytics– Almere 14-06-2012
7. Big Data as defined
by IDC (2011)
“Bringing together vast
amounts of data from public
and private sources,
combined with the intuition
of business and thought
leaders and the speed and
affordability of today's
computers.”
(IDC October 2011)
Big Data analytics– Almere 14-06-2012
8. Defining Big Data
The 3 V’s
Volume
Large amounts
Massive historical archives
Valuable for data mining Velocity
Velocity
At very high rates
(sensors, streams, social media, …)
Valuable in its “fresh” state
Volume Variety
Variety
Structured, semi-structured and unstructured
Variety also in Value
Big Data analytics– Almere 14-06-2012
9. 2 new V’s: Viscosity & Virality
Big Data analytics– Almere 14-06-2012
10. Big Data Drivers
“Internet of Things”
Commoditization of HPC
Human dynamics can be
easily stored and queried
with Apache Hadoop
HDFS (storage)
Hadoop Distributed File System
MapReduce (processing)
high performance parallel data
processing
Scalable & Self-healing
So, Big data is driven by
large scale data collection,
storage and (information)
processing
Big Data analytics– Almere 14-06-2012
11. Data Deluge @SARA
It has always been there…
Scientific Data Deluge:
Observations (e.g.
LOFAR, Lifewatch)
Large-scale Simulation
(e.g. astrophysics,
climate modeling)
Experiments (e.g. Large
Hadron Collider, DNA
e-Science and Technology Infrastructure for Large Hadron Collider
sequencers) Biodiversity Data and Ecosystem Research
Multi-Petabytes of data
growth at SARA each year
Single datasets of 10-100
Terabytes and larger
Multidisciplinary use of data
Science needs Insight, not
only Data Low Frequency Array Biobanking and Biomolecular
Resources Research Infrastructure
Big Data analytics– Almere 14-06-2012
12. Big Data Ultimate Challenge:
How to get insight?
As volume, variety and
velocity of data
increase, use of
visualization is
imperative to help
getting the insight for
an ever increasingly
data-driven future
Big Data analytics– Almere 14-06-2012
13. Some Applications of Big Data
Astronomy
Astron
1 Exabyte per
day raw data
2 times WWW
traffic per day
SKA
300 - 1500
Petabytes
storage per
year!
20 times LHC
Big Data analytics– Almere 14-06-2012
14. Some Applications of Big Data
Healthcare
Erasmus MC
Diagnostics:
4%: costs
72%: decisions
Opportunities
for disease
management:
1) New
classification of
patients for
better
diagnostics &
combined
therapy
2) Assessing and
managing risks
Big Data analytics– Almere 14-06-2012
15. Some Applications of Big Data
Water Management
Total length primary flood
defenses in Netherlands:
2875km spread over 90 dike
“rings”…
Decision Support System:
Integration of: Sensor data +
AI, Simulation results, Maps,
weather, ships, roadwork,
traffic, twitter, GSM, location
of emergency services, ...
Big Data analytics– Almere 14-06-2012
16. Some Applications of Big Data
Infrawatch, Hollandse Brug
145 x 100 x 60 x 60 x 24 x 365 = big data
sensors Hz seconds minutes hours days
(Arno Knobbe, LIACS, 2011, http://infrawatch.liacs.nl)
Big Data analytics– Almere 14-06-2012
17. Some Applications of Big Data
Ecology
Citizen Science:
>20,000 users, >50M
observations.
Bird radars: streaming
data, many terabytes
GPS-tracking:
Streaming data,
Word-wide projects.
Massive amounts of
complementary, multi-
scale information that
can not be “seen” in
the field.
Big Data analytics– Almere 14-06-2012
18. Some Applications of Big Data
eScience is also (big) data mining
Cognition: image analysis
and data exchange Food Specific Ontologies for
Climate Research: Food Focused Text Mining
Regional Sea-Level
Chemical Metabolomics
Data Analysis
Biography Portal: Data-Intensive Modeling
by SURF & NWO
interconnections, trends, of the Global Water Cycle
geographical maps and
time lines Big Data analytics– Almere 14-06-2012
19. CosmoGrid Case: The Need for
Integrated e-Infrastructure Services
A cosmological N-body simulation with
8,589,934,592 particles, formation of large
structures of dark matter
Dutch Computing Challenge Project & DEISA
Extreme Computing Initiative: DCCP 2008 –
2009 / DECI 2009
Run 1 + 2:
4.25 M core hours Computing, 110 TB data
Huygens Amsterdam + Cray XT4 Tokyo,
coupled via light path and Amsterdam + Tokyo
+ Helsinki + Edinburgh
High resolution data remote visualization on
tiled panel display
Advanced support in porting and optimization,
visualization, data storage, networking and
project management
All infrastructure elements and their integration are crucial
Visit SURF 7-6-2012
Big Data analytics– Almere 14-06-2012
20. Visualization @ SARA
more than 20 years of experience and support
Scientific visualization High resolution Scientific visualization
Scientific & industrial
support visualization support support
visualization support
Rendering Remote visualization Remote visualization
Virtual Reality
Animations and slides & streaming service Collaboration support
Big Data analytics– Almere 14-06-2012
21. How are we
Coping with Big Data?
HPC centers, universities,
and in recent years, Internet
companies like Yahoo!,
Facebook en of course
Google are pioneers (lots of
knowledge exchange, by the
way.)
We collect Big Data, store it
and we have the knowledge
to interpret it.
What tools do we have to
pull this out?
Big Data analytics– Almere 14-06-2012
22. ‘Collaboratorium’: New visualization
and collaboration facility @SARA
videoconference laptop 2 laptop 3 website in browser
Visualization of big shared data
data and trends
Also for improving
business and
Science models
and computational
debugging.
PowerPoint, Video
Conferencing,
videoconference laptop 1 data from data from
telepresence, 3D remote camera workstation 1 workstation 2
(stereo) projection
Based on proven
technology from
SARA and
partners EVL and
Calit2 (San Diego)
Big Data analytics– Almere 14-06-2012
23. Visualization @ SARA –
Remote Visualization
Remote visualization service
q Provide dedicated visualization resources in SARA
data center: Rendercluster and visualization software
(i.e. Paraview, VisIt, VTK, VMD, Blender, ...
q Embedded in national e-Infrastructure
q Visualization resource has direct access to storage at
SARA
q Avoid large data transfers over network (esp.
Internet) by running visualization applications
remotely
q Pixel output/remote desktop transferred to user,
instead of files
q Application support for parallel rendering
Big Data analytics– Almere 14-06-2012
24. Big Data Requires Big Computing
What benefits could
exascale computing bring?
It will enable discovery in many
areas of science. "Aerospace
engineering, astrophysics,
biology, climate modeling and
national security all have
applications with extreme
computing requirements,"
Big Data analytics– Almere 14-06-2012
25. Compute Ecosystem @SARA
1. Low-latency, high-bandwidth
capability computing (Huygens)
2. Capacity compute clusters
(LISA)
3. Loosely coupled compute Grids
(Big Grid)
4. Sector, private and public
Clouds Including our HPC
Cloud) and Beehub storage
5. Special-purpose (GPU)
clusters
6. Big Data Apache Hadoop
systems (since 2009)
Big Data analytics– Almere 14-06-2012
26. Big Data Eco-System @SARA
DevOps Programming algorithms Domain knowledge
Big Data analytics– Almere 14-06-2012
27. To Summarize: Big Data
Is Changing Rapidly our Life
Big Data is changing
science, medicine,
business, and technology.
A whole new way of
science: correlation
supersedes causation,
coherent models or unified
theories…
Biggest challenge for
science & business is not
storing or processing data
but how to make sense of
it without affecting our
privacy.
Big Data analytics– Almere 14-06-2012
28. Big Data… Big Enough?
Thank You
Big Data analytics– Almere 14-06-2012
Dames en heren een heel Goede morgen Het is een heel grote plezier voor mij om de eerste keynote op deze Big data dag hier in Almere te mogen geven.
Iets over SARA
Almere begrijpt als geen andere het belang van schaal Almere wil tweemaal zo groot worden en is al jaren bezig op vele fronten om dit te bewerkstelligen. ICT heeft in Almere altijd een belangrijke rol gespeeld. SARA is al begin van deze eeuw zich in Almere gaan vestigen en de eerste pilote voor Breedband was ook in Almere. Met het programma Almere DataCapital, wil Almere zich profileren als Dutch Media en Health Hub. De positie als datahoofdstad moet de gemeente op termijn 2500 nieuwe banen opleveren
De Economist heeft als eerste de wereld bewust gemaakt van de potentie en belang van Big Data met hun befaamde artikel over the data deluge. In december 2011 was Big Data het hoofdthema van de World Economic Forum in Davos. Centraal stond belang van data intelligence voor besluitvorming op politieke, sociale en economische vlakken. IDC in maart dit jaar een forecast gedaan over de groei van de Big Data markt: maar liefst 40% groei per jaar, 7 keer zoveel als de gemiddelde groei van de ICT markt
Maar eerst wat verstaan we onder Big data? Eerst de befaamde uitspraak van de Amerikaanse senator, wel in een heel andere context: I Cannot define it, but I know it when I see it”… We weten ook dat het begrip grote hoeveelheden data is relatief. Wat vandaag veel data is, kan morgen normaal of zelfs als klein aangemerkt worden.
Volgens Wikipedia is wordt het Big Data, wanneer de data hoeveelheid groter is dan wat gebruikelijk is om data te collecteren, managen en processen binnen een acceptabele tijdsbestek
Volgens IDC gaat Big Data over het samenbrengen van grote hoeveelheden data van publieke en private bronnen gecombineerd met de intuïtie en ideeën van business leiders en de mogelijkheden van betaalbare ICT.
Big Data wordt doorgaan gekaracteriseerd met de 3 V’s Volume Grote hoeveelheden data. Snelheid van dataproductie veroorzaakt ook gigantische logfiles en archieven. Allen zeer waardevol voor data minimig. Velocity – zowel batch, als near-time, en real-time, streams. Big data wordt gekenmerkt met zeer hoge snelheden, sensoren, media streams Het is heel belangrijk om snel te handelen maar ook de data heel veel waarde als die zogenaamd nog vers is. Variety – Big data is gestructureerd, half gestructureerd maar ook ongestructureerd. Verschillende bronnen oud en nieuw leveren een grote diversiteit aan data format en vandaar dat oude DBMS technieken niet meer alleen toereikend zijn.
Twee nieuwe V’s zijn recentelijk hieraan toegevoegd: Viscosity: viscositeit is een maat van de weerstand om door de data heen komen en data in informatie en dus inzicht om te vormen. Betere technieken zijn vereist zoals beter streaming, betere integratie en processing technieken. Virality: Virality beschrijft hoe snel de data verspreidt zich tussen mensen (P2P).
Samenvattend de belangrijkste drijfveren zijn dus: Internet of things HPC is een commodity geworden: data acquisitie, opslag en verwerking En de derde doorbraak is de opkomst van Hadoop, een open-source project van Apache Software Foundation. Hadoop biedt een betrouwbare en ook vooral betaalbare data storage oplossing met het Hadoop Distributed File System (HDFS) en een high-performance parallel data processing techniek zogenaamd MapReduce.
Dit Big Data Deluge is niets nieuws bij SARA Projecten als Lofar en Lifewatch leverden al grote hoeveelheden data Ook grootschalige simulaties in astrofysica maar ook klimaat modelleringen En uiteraard alleen al de experimenten bij De CERN in Genève leveren ruim 2 petabytes per jaar. SARA is gewend om vele petabytes per jaar te collecteren, managen en te processen. Enkele datasets hiervan zijn tientallen tot honderden terabytes groot. Maar de uitdaging is om inzicht uit de data te halen.
Maar de hamvraag bij Big Data is niet meer hoe de data opgeslagen en verwerkt moet worden maar vooral het verkrijgen van inzicht uit de berg informatie die eruit komt!
Een paar Big Data Applicatie: Astronomie: astronomische data hoeveelheden door grootschalige infrastructurele projecten zoals Lofar en Square Kilometer ARRAY. Alleen al bij Astron verwacht men om 1 exabyte (dat is 1000 peta) per dag te moeten verwerken. Dat is 2 maal de hoeveelheid dataverkeer op de WWW. Binnen het SKA project zal het om tussen 300 en 1500 exabytes opslag per jaar. Dat is minimaal ruim 20 maal de hoeveelheid data gegenereerd door de LHC van CERN.
Andere voorbeeld komt uit de gezondheidszorg. Diagnosekosten zijn 4% van de totale kosten terwijl besluiten op hebben 72% impact. Big data waar genexpressie, gecombineerd met proteomics, screening chemoinformatics en ook tekstuele data mining van literatuur en patenten levert betere classificaties van patiënten, beter diagnose en beter therapie, ook betere beoordeling en management van risico’s
Andere applicatie is water management. De totale lengte van primaire water defensielijnen in NL is een klein 3000 km verspreid over 90 dijkringen. Het project heeft met behulp van big data een decision support systeem opgezet met een combinatie van sensoren, AI, simulatie data, kaarten, weer gegevens, scheep en autoverkeergegevens, twitter, GSM, locatie van hulpdiensten, etc. Doelstelling is bij calamiteiten de hulpdiensten te assisteren om problemen vroegtijdig te detecteren en zoveel mogelijk mensen te redden.
Zo is het project Infrawatch, hiet zien we de studie op de Hollandse brug. Sinds 2008 is de brug geëquipeerd met een grote verzameling aan sensoren die voortdurend de verkeerlast op de brug en de reactie van de infra hierop meten. De data die hierdoor verzameld wordt is rond de 11 Gb per dag. day.
Ook is ecologie, heeft Big Data haar intrede gedaan. Hier zijn de burgers ook bij betrokken, met name met observaties die verspreid over het land moeten geschieden. Er is dan sprake van meer dan 20,000 gebruikers en meer dan 50 miljoen observaties. Voorbeelden zijn inzet van radars voor volgen van vogels, inzet van GPS systemen, en een veelvoud aan extra informatie door sensoren en camera’s die door observaties alleen niet gegeven kunnen worden.
Er zijn nog meer applicaties van Big Data waar SARA bij betrokken is zoals Klimaat onderzoek voor bijvoorbeeld zeeniveau Beeldverwerking op het gebied van Cognitie Tekstverwerking en ontologie in voedsel Tekst mining van biografieën. Chemische metabolomics en Modellering van watercycli op wereldschaal. Allen worden gekenmerkt door grote hoeveelheden data uit simulaties, modelleringen, tekst- en beeldverwerking en ook sensoren en opnemers.
Een voorbeeld waarin de geintegreerde infrastructuur belicht wordt Probleem stelling, simulatie over tijd van de beweging van sterrenstelsel, en de simulaties vergelijken met de geobserveerde structuren die we nu zien. N-body simulatie waarbij de bodies galaxies zijn. Gebruikte infrastructuren: Compute: meerdere supers in de wereld tegelijkertijd Opslag: Netwerken: om de computers aan elkaar te verbinden en data terug te leveren Visualisatie: kennis opdoen door data zichtbaar te maken Support: als verbindende factor om alles efficient te laten verlopen en expertise in te schieten Deze partij is ook betrokken in een EyR-3 aanvraag (Seintra, VU).
Visualisatie is hierin onontbeerlijk. SARA heeft een hele historie met visualisatie sinds ruim 20 jaar. Eerst met een van de eerste visualisatie centera in NL. Daarna met de eerste CAVE in Europa. Daarna met de Tiled Panel Displays voor met name remote visualisatie van grote hoeveelheden data. en recentelijk met de opening van het Collaboratorium
Hoe gaan we om met Big Data? Voor HPC Centra, universiteiten en de laatste jaren internet bedrijven zoals Yahoo!, Google en Facebook zijn de pioniers in de aanpk van Big data. Veel kennisuitwisseling gebeurt tussen die partijen. Bij SARA hebben we de kennis en ervaring in het verzamelen en opslaan van zowel gestructureerde als ongestructureerde big data. Ook verwerking van de data en hoe deze te interpreteren. Welke tools hebben we hiervoor nodig?.
Het Collaboratorium is uitermate geschikt voor de visualisatie van grote hoeveelheden data en datatrends. Kan ook gebruikt worden voor het verbeteren van business en wetenschappelijke modellen en ook het debuggen van big data sofware. Het is een hoge resolutie TPD met de mogelijkheden om PPT, video conferencing, 3D stereo projectie en telepresence door verschillende gebruikers ook remote en tegelijk in te zetten. Het TPD is ook op de nationale infra van SARA aangesloten. Het Collaboratorium maakt gebruik van technologie die internationaal ontwikkeld wordt binnen een samenwerking waar SARA aan deelneemt.
Big Data vereist ook grootschalige opslag en verwerking. Vandaar ook de race naar exascale computing. Snelle data verwerking levert doorbraken in verschillende gebieden van wetenschap, en business en is steeds belangrijker voor sociale en maatschappelijke vraagstukken zoals de macro-economie en nationale veiligheid.
Over de jaren heen, heeft SARA een eco-systeem opgebouwd voor o.a. opslag en verwerking van Big Data met data en taak parallelisme. Het eco-systeem dekt de meeste gangbare reken en data problemen in wetenschap en business. Van zogenaamde low-latency, high bandwidth capability computing voor het ineens oplossen van 1 groot ondeelbaar probleem met de nationale supercomputer. Tot het ineens oplossen van een heel groot, wel opdeelbaar probleem op het nationale capacity computing clutser LISA, of losser gekoppeld op een cloud of de grid. SARA beschikt over een aparte rekencluster met Hadoop dataopslag en verwerkingcapaciteit voor Big Data.
Samenwerking tussen devops (dat zijn developers die ook operators zijn), programmeurs en domeinspecialisten is cruciaal in de aanpak van Big data. Devops zorgen voor de parallelle data opslag en met Hadoop, de programmeurs voor de data verwerking met MapReduce en uiteindelijk zorgen de domein experts voor de domeinapplicatie en data interpretatie. Het is wel belangrijk voor de big data applicatie ontwikkeling om de experts van alle drie domeinen in huis te hebben.
Samenvattend: Big data is heel snel onze leven aan het veranderen, van wetenschap tot, medicijnen, to business en technologie. Big data leidt tot een heel nieuwe methode van wetenschap waar correlaties belangrijker zijn dan klassieke causale verbanden zeker in levenswetenschappen. De grootste uitdaging van Big Data is het verkrijgen van inzichten met respect voor onze privacy en keuzevrijheid.