The document provides an overview of the BEXIS2 platform for collaborative data management. It describes the key steps in the workflow for registering an account, creating a data structure, uploading datasets, and searching or accessing uploaded data. The platform is designed to integrate tabular data from research projects and promote open data sharing. Users can download Excel templates to input data, provide metadata to describe datasets, and make data publicly available or share access with other users.
This document summarizes the key new features and capabilities in Neo4j 4.0. It discusses how Neo4j 4.0 provides unlimited scalability through sharding and federation. It also introduces a fully reactive architecture and granular security controls for privacy. Finally, it highlights how Neo4j Desktop can help developers work with Neo4j from idea to production.
Christopher Gutteridge's slides form Connected Data London. Christopher, who is an Open Data Architect at the Univeristy of Southhampton presented why and how people should employ an Open Data strategy at their organisation.
Video in french at https://www.youtube.com/watch?v=9LNnNh63rBI
Sizing an Elasticsearch cluster has to consider many dimensions. In this presentation we go through the different elements and features you should consider to handle big and varying loads of log data.
pandas: Powerful data analysis tools for PythonWes McKinney
Wes McKinney introduced pandas, a Python data analysis library built on NumPy. Pandas provides data structures and tools for cleaning, manipulating, and working with relational and time-series data. Key features include DataFrame for 2D data, hierarchical indexing, merging and joining data, and grouping and aggregating data. Pandas is used heavily in financial applications and has over 1500 unit tests, ensuring stability and reliability. Future goals include better time series handling and integration with other Python data science packages.
It was the talk, titled "Graph-Tool: The Efficient Network Analyzing Tool for Python", at PyCon APAC 2014 [1] and PyCon SG 2014 [2]. It introduces you to Graph-Tool by mass code snippets.
[1] https://tw.pycon.org/2014apac
[2] https://pycon.sg/
The document discusses NAILS (Network Analysis Interface for Literature Studies), a tool for performing bibliometric network analysis and social network analysis on citation data from literature. It describes NAILS functionality for analyzing literature from Web of Science, including identifying the most cited authors, publications, keywords. The document also presents a case study analyzing literature on "augmented reality education" using NAILS and its web interface HAMMER. Key outputs from the case study include visualizations of publication volume, influential authors, topics identified through modeling.
The document describes the NAILS Project, which provides a tool called NAILS for performing bibliometric network analysis and social network analysis on literature. NAILS analyzes publications from Web of Science to identify the most cited authors, publications, and keywords. It also extracts citation networks and performs topic modeling. The document includes a case study analyzing literature on "augmented reality education" to demonstrate NAILS and provides examples of its output, including CSV files, graphs, and topic modeling results.
In addition to seeing the latest features in Splunk Enterprise, learn some of the top commands that will solve most search and analytics needs. Ninja’s can use these blindfolded. New features will be demonstrated in the following areas: TCO and Performance Improvements, Platform Management and New Interactive Visualizations.
This document summarizes the key new features and capabilities in Neo4j 4.0. It discusses how Neo4j 4.0 provides unlimited scalability through sharding and federation. It also introduces a fully reactive architecture and granular security controls for privacy. Finally, it highlights how Neo4j Desktop can help developers work with Neo4j from idea to production.
Christopher Gutteridge's slides form Connected Data London. Christopher, who is an Open Data Architect at the Univeristy of Southhampton presented why and how people should employ an Open Data strategy at their organisation.
Video in french at https://www.youtube.com/watch?v=9LNnNh63rBI
Sizing an Elasticsearch cluster has to consider many dimensions. In this presentation we go through the different elements and features you should consider to handle big and varying loads of log data.
pandas: Powerful data analysis tools for PythonWes McKinney
Wes McKinney introduced pandas, a Python data analysis library built on NumPy. Pandas provides data structures and tools for cleaning, manipulating, and working with relational and time-series data. Key features include DataFrame for 2D data, hierarchical indexing, merging and joining data, and grouping and aggregating data. Pandas is used heavily in financial applications and has over 1500 unit tests, ensuring stability and reliability. Future goals include better time series handling and integration with other Python data science packages.
It was the talk, titled "Graph-Tool: The Efficient Network Analyzing Tool for Python", at PyCon APAC 2014 [1] and PyCon SG 2014 [2]. It introduces you to Graph-Tool by mass code snippets.
[1] https://tw.pycon.org/2014apac
[2] https://pycon.sg/
The document discusses NAILS (Network Analysis Interface for Literature Studies), a tool for performing bibliometric network analysis and social network analysis on citation data from literature. It describes NAILS functionality for analyzing literature from Web of Science, including identifying the most cited authors, publications, keywords. The document also presents a case study analyzing literature on "augmented reality education" using NAILS and its web interface HAMMER. Key outputs from the case study include visualizations of publication volume, influential authors, topics identified through modeling.
The document describes the NAILS Project, which provides a tool called NAILS for performing bibliometric network analysis and social network analysis on literature. NAILS analyzes publications from Web of Science to identify the most cited authors, publications, and keywords. It also extracts citation networks and performs topic modeling. The document includes a case study analyzing literature on "augmented reality education" to demonstrate NAILS and provides examples of its output, including CSV files, graphs, and topic modeling results.
In addition to seeing the latest features in Splunk Enterprise, learn some of the top commands that will solve most search and analytics needs. Ninja’s can use these blindfolded. New features will be demonstrated in the following areas: TCO and Performance Improvements, Platform Management and New Interactive Visualizations.
This document provides an introduction to big data and basic data analysis techniques. It discusses the large amounts of data being generated daily from sources like the web, social networks, and scientific projects. It also covers common data types, challenges in working with big data, and some basic statistical and data mining techniques for analyzing large datasets including classification, clustering, and association rule mining.
Filipe paternot - Case Study: Zabbix Deployment at Globo.comZabbix
This talk will cover two years of Zabbix deployment at Globo.com - the web branch of Rede Globo, a major media player in Brazil providing access to media products to more than 1,8 million visitors hourly, 45 millions each day.
The case study will include migration from legacy systems, integration, templates and deployment, suming up the challenges, solutions and the gained knowledge.
Zabbix Conference 2015
OpenTSDB is used at Criteo for monitoring their large Hadoop infrastructure which includes over 2500 servers running many different services like HDFS, YARN, HBase, Kafka, and Storm. OpenTSDB was chosen because it can handle the scale of metrics collected, store metrics for long periods of time with fine-grained resolution, and is easily extensible to add new metrics. It uses HBase for storage which is optimized for the time series data stored in OpenTSDB and can scale to meet Criteo's needs of storing billions of data points and handling high query loads.
This document provides a step-by-step guide to learning R. It begins with the basics of R, including downloading and installing R and R Studio, understanding the R environment and basic operations. It then covers R packages, vectors, data frames, scripts, and functions. The second section discusses data handling in R, including importing data from external files like CSV and SAS files, working with datasets, creating new variables, data manipulations, sorting, removing duplicates, and exporting data. The document is intended to guide users through the essential skills needed to work with data in R.
The document discusses schema design patterns for MongoDB databases. It introduces common patterns like attribute, subset, computed, and approximation to address issues like large documents, working set size, repeated calculations, and high write volumes. The patterns help optimize performance, scalability, and reduce costs. The document explains each pattern's problem, solution, and benefits through examples like storing movie release dates and computed values. It encourages applying these proven patterns as building blocks to design schemas for specific use cases like e-commerce or social networking applications.
I am shubham sharma graduated from Acropolis Institute of technology in Computer Science and Engineering. I have spent around 2 years in field of Machine learning. I am currently working as Data Scientist in Reliance industries private limited Mumbai. Mainly focused on problems related to data handing, data analysis, modeling, forecasting, statistics and machine learning, Deep learning, Computer Vision, Natural language processing etc. Area of interests are Data Analytics, Machine Learning, Machine learning, Time Series Forecasting, web information retrieval, algorithms, Data structures, design patterns, OOAD.
The document discusses new features and capabilities in Neo4j 4.0, including unlimited scalability through sharding and federation, a fully reactive architecture, and new security and data privacy controls. It also introduces Neo4j Desktop for graph development workflows, Neo4j Aura cloud database service, and visualization and analytics tools for working with graph data.
Streaming ETL - from RDBMS to Dashboard with KSQLBjoern Rost
Apache Kafka is a massively scalable message queue that is being used at more and more places connecting more and more data sources. This presentation will introduce Kafka from the perspective of a mere mortal DBA and share the experience of (and challenges with) getting events from the database to Kafka using Kafka connect including poor-man’s CDC using flashback query and traditional logical replication tools. To demonstrate how and why this is a good idea, we will build an end-to-end data processing pipeline. We will discuss how to turn changes in database state into events and stream them into Apache Kafka. We will explore the basic concepts of streaming transformations using windows and KSQL before ingesting the transformed stream in a dashboard application.
This document outlines advice for building an applied data science portfolio. It recommends including projects that demonstrate hands-on experience with cloud computing ecosystems, tackle large datasets using distributed computing techniques, integrate different systems to build end-to-end pipelines, and go beyond presentations to write about projects through white papers, blog posts, or books. Specific project ideas are provided in areas like deploying models as APIs, using serverless functions, and building recommendation engines or simulations with big data tools. The goal is to showcase experience and learn new skills through portfolio projects.
MW2011 Grid-based Web Design presentationCharlie Moad
This document discusses the benefits of using grid-based web design. It provides a brief history of grid design and influential designers like Emil Ruder and Josef Müller-Brockmann. Grids offer benefits to designers, developers, and content authors by providing structure and consistency. A case study of redesigning the Indianapolis Museum of Art's website using a grid is presented. Tools for implementing grids are also reviewed. The document argues that grids will remain a relevant design approach as new devices emerge.
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...Wes McKinney
This document discusses pandas, a popular Python library for data analysis, and its limitations. It introduces Badger, a new project from DataPad that aims to address some of pandas' shortcomings like slow performance on large datasets and lack of tight database integration. The creator describes Badger as using compressed columnar storage, immutable data structures, and C kernels to perform analytics queries much faster than pandas or databases on benchmark tests of a multi-million row dataset. He envisions Badger becoming a distributed, multicore analytics platform that can also be used for ETL jobs.
This document provides an agenda for a training session on AI and data science. The session is divided into two units: data science and data visualization. Key Python libraries that will be covered for data science include NumPy, Pandas, and Matplotlib. NumPy will be used to create and manipulate multi-dimensional arrays. Pandas allows users to work with labeled and relational data. Matplotlib enables data visualization through graphs and plots. The session aims to provide knowledge of core data science libraries and demonstrate data exploration techniques using these packages.
Time Series data is proliferating with literally every step that we take, just think about things like Fit Bit bracelets that track your every move and financial trading data all of which is timestamped.
Time series data requires high performance reads and writes even with a huge number of data sources. Both speed and scale are integral to success, which makes for a unique challenge for your database.
A time series NoSQL data model requires flexibility to support unstructured, and semi-structured data as well as the ability to write range queries to analyze your time series data. So how can you tackle speed, scale and flexibility all at once?
Join Professional Services Architect Drew Kerrigan and Developer Advocate Matt Brender for a discussion of:
Examples of time series data sets, from IoT to Finance to jet engines
What makes time series queries different from other database queries
How to model your dataset to answer the right questions about your data
How to store, query and analyze a set of time series data points
Learn how a NoSQL database model and Riak TS can help you address the unique challenges of time series data.
According to a recent Harvard Business Review study, there’s only a 43% chance that customers who have a poor experience will stick with you for the next 12 months. Contrast that to the 74% that will remain your customer if they have a great experience. Learn how Macy’s, a leading American department store chain founded in 1858 with over 750 stores in North America, is transforming their customer experience with DataStax Enterprise.
Webinar recording: https://youtu.be/CiUVxh6Ov_E
View current and past DataStax webinars: http://www.datastax.com/resources/webinars
This document provides an agenda for Part II of an SPP 2089 data management training. The agenda includes topics such as troubleshooting common data upload issues, improving dataset quality, and attaching metadata to data. Techniques for updating datasets, ensuring data consistency and completeness, linking related datasets, and adding explanatory information to datasets are discussed. The training emphasizes using the BEXIS2 data management platform to properly store, organize, and document research data over the full data lifecycle in accordance with SPP 2089 guidelines.
This document outlines an agenda for a data management training session. The full-day session will cover basics in the morning, advanced topics after lunch, and end with a question and answer period and required homework. Attendees will learn about account creation and login procedures for various research platforms, file labeling standards, and data management best practices including uploading, downloading, sharing and archiving data throughout its lifecycle. The document provides details on specific topics to be covered as well as templates and guidelines for research activities like field and column experiments.
This document provides an introduction to big data and basic data analysis techniques. It discusses the large amounts of data being generated daily from sources like the web, social networks, and scientific projects. It also covers common data types, challenges in working with big data, and some basic statistical and data mining techniques for analyzing large datasets including classification, clustering, and association rule mining.
Filipe paternot - Case Study: Zabbix Deployment at Globo.comZabbix
This talk will cover two years of Zabbix deployment at Globo.com - the web branch of Rede Globo, a major media player in Brazil providing access to media products to more than 1,8 million visitors hourly, 45 millions each day.
The case study will include migration from legacy systems, integration, templates and deployment, suming up the challenges, solutions and the gained knowledge.
Zabbix Conference 2015
OpenTSDB is used at Criteo for monitoring their large Hadoop infrastructure which includes over 2500 servers running many different services like HDFS, YARN, HBase, Kafka, and Storm. OpenTSDB was chosen because it can handle the scale of metrics collected, store metrics for long periods of time with fine-grained resolution, and is easily extensible to add new metrics. It uses HBase for storage which is optimized for the time series data stored in OpenTSDB and can scale to meet Criteo's needs of storing billions of data points and handling high query loads.
This document provides a step-by-step guide to learning R. It begins with the basics of R, including downloading and installing R and R Studio, understanding the R environment and basic operations. It then covers R packages, vectors, data frames, scripts, and functions. The second section discusses data handling in R, including importing data from external files like CSV and SAS files, working with datasets, creating new variables, data manipulations, sorting, removing duplicates, and exporting data. The document is intended to guide users through the essential skills needed to work with data in R.
The document discusses schema design patterns for MongoDB databases. It introduces common patterns like attribute, subset, computed, and approximation to address issues like large documents, working set size, repeated calculations, and high write volumes. The patterns help optimize performance, scalability, and reduce costs. The document explains each pattern's problem, solution, and benefits through examples like storing movie release dates and computed values. It encourages applying these proven patterns as building blocks to design schemas for specific use cases like e-commerce or social networking applications.
I am shubham sharma graduated from Acropolis Institute of technology in Computer Science and Engineering. I have spent around 2 years in field of Machine learning. I am currently working as Data Scientist in Reliance industries private limited Mumbai. Mainly focused on problems related to data handing, data analysis, modeling, forecasting, statistics and machine learning, Deep learning, Computer Vision, Natural language processing etc. Area of interests are Data Analytics, Machine Learning, Machine learning, Time Series Forecasting, web information retrieval, algorithms, Data structures, design patterns, OOAD.
The document discusses new features and capabilities in Neo4j 4.0, including unlimited scalability through sharding and federation, a fully reactive architecture, and new security and data privacy controls. It also introduces Neo4j Desktop for graph development workflows, Neo4j Aura cloud database service, and visualization and analytics tools for working with graph data.
Streaming ETL - from RDBMS to Dashboard with KSQLBjoern Rost
Apache Kafka is a massively scalable message queue that is being used at more and more places connecting more and more data sources. This presentation will introduce Kafka from the perspective of a mere mortal DBA and share the experience of (and challenges with) getting events from the database to Kafka using Kafka connect including poor-man’s CDC using flashback query and traditional logical replication tools. To demonstrate how and why this is a good idea, we will build an end-to-end data processing pipeline. We will discuss how to turn changes in database state into events and stream them into Apache Kafka. We will explore the basic concepts of streaming transformations using windows and KSQL before ingesting the transformed stream in a dashboard application.
This document outlines advice for building an applied data science portfolio. It recommends including projects that demonstrate hands-on experience with cloud computing ecosystems, tackle large datasets using distributed computing techniques, integrate different systems to build end-to-end pipelines, and go beyond presentations to write about projects through white papers, blog posts, or books. Specific project ideas are provided in areas like deploying models as APIs, using serverless functions, and building recommendation engines or simulations with big data tools. The goal is to showcase experience and learn new skills through portfolio projects.
MW2011 Grid-based Web Design presentationCharlie Moad
This document discusses the benefits of using grid-based web design. It provides a brief history of grid design and influential designers like Emil Ruder and Josef Müller-Brockmann. Grids offer benefits to designers, developers, and content authors by providing structure and consistency. A case study of redesigning the Indianapolis Museum of Art's website using a grid is presented. Tools for implementing grids are also reviewed. The document argues that grids will remain a relevant design approach as new devices emerge.
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...Wes McKinney
This document discusses pandas, a popular Python library for data analysis, and its limitations. It introduces Badger, a new project from DataPad that aims to address some of pandas' shortcomings like slow performance on large datasets and lack of tight database integration. The creator describes Badger as using compressed columnar storage, immutable data structures, and C kernels to perform analytics queries much faster than pandas or databases on benchmark tests of a multi-million row dataset. He envisions Badger becoming a distributed, multicore analytics platform that can also be used for ETL jobs.
This document provides an agenda for a training session on AI and data science. The session is divided into two units: data science and data visualization. Key Python libraries that will be covered for data science include NumPy, Pandas, and Matplotlib. NumPy will be used to create and manipulate multi-dimensional arrays. Pandas allows users to work with labeled and relational data. Matplotlib enables data visualization through graphs and plots. The session aims to provide knowledge of core data science libraries and demonstrate data exploration techniques using these packages.
Time Series data is proliferating with literally every step that we take, just think about things like Fit Bit bracelets that track your every move and financial trading data all of which is timestamped.
Time series data requires high performance reads and writes even with a huge number of data sources. Both speed and scale are integral to success, which makes for a unique challenge for your database.
A time series NoSQL data model requires flexibility to support unstructured, and semi-structured data as well as the ability to write range queries to analyze your time series data. So how can you tackle speed, scale and flexibility all at once?
Join Professional Services Architect Drew Kerrigan and Developer Advocate Matt Brender for a discussion of:
Examples of time series data sets, from IoT to Finance to jet engines
What makes time series queries different from other database queries
How to model your dataset to answer the right questions about your data
How to store, query and analyze a set of time series data points
Learn how a NoSQL database model and Riak TS can help you address the unique challenges of time series data.
According to a recent Harvard Business Review study, there’s only a 43% chance that customers who have a poor experience will stick with you for the next 12 months. Contrast that to the 74% that will remain your customer if they have a great experience. Learn how Macy’s, a leading American department store chain founded in 1858 with over 750 stores in North America, is transforming their customer experience with DataStax Enterprise.
Webinar recording: https://youtu.be/CiUVxh6Ov_E
View current and past DataStax webinars: http://www.datastax.com/resources/webinars
This document provides an agenda for Part II of an SPP 2089 data management training. The agenda includes topics such as troubleshooting common data upload issues, improving dataset quality, and attaching metadata to data. Techniques for updating datasets, ensuring data consistency and completeness, linking related datasets, and adding explanatory information to datasets are discussed. The training emphasizes using the BEXIS2 data management platform to properly store, organize, and document research data over the full data lifecycle in accordance with SPP 2089 guidelines.
This document outlines an agenda for a data management training session. The full-day session will cover basics in the morning, advanced topics after lunch, and end with a question and answer period and required homework. Attendees will learn about account creation and login procedures for various research platforms, file labeling standards, and data management best practices including uploading, downloading, sharing and archiving data throughout its lifecycle. The document provides details on specific topics to be covered as well as templates and guidelines for research activities like field and column experiments.
This is the presentation part of an M.Sc. thesis in Software Engineering at Friedrich Schiller University Jena entitled “Dataset quality visualization in BEXIS2”. In this thesis, a visual overview of data quality was prototypically implemented as a new feature for the BEXIS2 Data Management System to make studying dataset quality straightforward.
This is a project work that was required by the lecture "Semantic Web Technology" in the winter semester 2017/2018 at the Friedrich-Schiller-University Jena. Prof. Dr. König-Ries and Dr. Chamanara were the supervisors.
This presentation is required by the lecture "History of the Computer" in the summer semester of 2018 at the University of Jena. The text is in German.
This document discusses the history and future of data science over the past 50 years and next 50 years. It covers:
1) How data science has evolved from its origins in statistics and data analysis in the 1960s to become a broader field today, encompassing skills in software engineering, machine learning, and domain expertise.
2) The six main divisions of modern data science: data exploration/preparation, representation/transformation, computing with data, visualization, modeling, and science about data science itself.
3) How open science, data and code sharing, and empirical validation of methods will drive the field to become more reproducible, collaborative, and evidence-based over the next 50 years.
The document discusses question answering over knowledge graphs. It introduces question answering and describes how knowledge graphs can be used to answer natural language questions. It summarizes three proposed papers on learning knowledge graphs for question answering through dialogs, automated template generation for question answering over knowledge graphs, and generating knowledge questions from knowledge graphs. The document also covers motivation for question answering, defining characteristics, different methods like template-based and dialog-based systems, evaluating knowledge quality, and examples of question answering systems.
The document discusses facilitating the discovery of public datasets. It describes Schema.org, a collaborative project to add metadata to content using microdata, RDFa or JSON-LD formats. It also discusses challenges in identifying and relating datasets, as well as properties for describing datasets, such as name, description, URL, version, and spatial/temporal coverage. An example is given of markup for a seismic hazard zones dataset using these properties.
Measures in SQL (SIGMOD 2024, Santiago, Chile)Julian Hyde
SQL has attained widespread adoption, but Business Intelligence tools still use their own higher level languages based upon a multidimensional paradigm. Composable calculations are what is missing from SQL, and we propose a new kind of column, called a measure, that attaches a calculation to a table. Like regular tables, tables with measures are composable and closed when used in queries.
SQL-with-measures has the power, conciseness and reusability of multidimensional languages but retains SQL semantics. Measure invocations can be expanded in place to simple, clear SQL.
To define the evaluation semantics for measures, we introduce context-sensitive expressions (a way to evaluate multidimensional expressions that is consistent with existing SQL semantics), a concept called evaluation context, and several operations for setting and modifying the evaluation context.
A talk at SIGMOD, June 9–15, 2024, Santiago, Chile
Authors: Julian Hyde (Google) and John Fremlin (Google)
https://doi.org/10.1145/3626246.3653374
What to do when you have a perfect model for your software but you are constrained by an imperfect business model?
This talk explores the challenges of bringing modelling rigour to the business and strategy levels, and talking to your non-technical counterparts in the process.
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemPeter Muessig
Learn about the latest innovations in and around OpenUI5/SAPUI5: UI5 Tooling, UI5 linter, UI5 Web Components, Web Components Integration, UI5 2.x, UI5 GenAI.
Recording:
https://www.youtube.com/live/MSdGLG2zLy8?si=INxBHTqkwHhxV5Ta&t=0
14 th Edition of International conference on computer visionShulagnaSarkar2
About the event
14th Edition of International conference on computer vision
Computer conferences organized by ScienceFather group. ScienceFather takes the privilege to invite speakers participants students delegates and exhibitors from across the globe to its International Conference on computer conferences to be held in the Various Beautiful cites of the world. computer conferences are a discussion of common Inventions-related issues and additionally trade information share proof thoughts and insight into advanced developments in the science inventions service system. New technology may create many materials and devices with a vast range of applications such as in Science medicine electronics biomaterials energy production and consumer products.
Nomination are Open!! Don't Miss it
Visit: computer.scifat.com
Award Nomination: https://x-i.me/ishnom
Conference Submission: https://x-i.me/anicon
For Enquiry: Computer@scifat.com
Preparing Non - Technical Founders for Engaging a Tech AgencyISH Technologies
Preparing non-technical founders before engaging a tech agency is crucial for the success of their projects. It starts with clearly defining their vision and goals, conducting thorough market research, and gaining a basic understanding of relevant technologies. Setting realistic expectations and preparing a detailed project brief are essential steps. Founders should select a tech agency with a proven track record and establish clear communication channels. Additionally, addressing legal and contractual considerations and planning for post-launch support are vital to ensure a smooth and successful collaboration. This preparation empowers non-technical founders to effectively communicate their needs and work seamlessly with their chosen tech agency.Visit our site to get more details about this. Contact us today www.ishtechnologies.com.au
5. BEXIS2 Potencials
• designed for collaborative projects
• focus on active data (i.e. project life-time)
• focus on tabular data, but not limited to
• focus on data integration and re-use the structures
• Worldwide, modular, free and open source
nafiseh.navabpour@ufz.de SPP2089.ufz.de 5
6. The Workflow
• Registration
• Login
• Creating a data structure
• Downloading an Excel template
• Working with the Excel template
• Creating a Dataset
• Providing metadata
• Uploading the dataset
• Seeing a Dataset
• Searching
nafiseh.navabpour@ufz.de SPP2089.ufz.de 6
8. The Workflow
• Registration
• Login
• Creating a data structure
• Downloading an Excel template
• Working with the Excel template
• Creating a Dataset
• Providing metadata
• Uploading the dataset
• Seeing a Dataset
• Searching
nafiseh.navabpour@ufz.de SPP2089.ufz.de 8
11. The Workflow
• Registration
• Login
• Creating a data structure
• Downloading an Excel template
• Working with the Excel template
• Creating a Dataset
• Providing metadata
• Uploading the dataset
• Seeing a Dataset
• Searching
nafiseh.navabpour@ufz.de SPP2089.ufz.de 11
13. The Workflow
• Registration
• Login
• Creating a data structure
• Downloading an Excel template
• Working with the Excel template
• Creating a Dataset
• Providing metadata
• Uploading the dataset
• Seeing a Dataset
• Searching
nafiseh.navabpour@ufz.de SPP2089.ufz.de 13
20. Creating a data structure
Restrictions
• Free to choose a column name (e.g. [„-%&})
• Choose the common names
• Free to design a data table
• Design a re-usable structure
• Think about empty cells
• Not measured or No result
• Specify the data type in each column
• String: free text format
• Integer: …, -3, -2, -1, 0, 1, 2, 3, …
• Double: 16 digits (e.g. 2.455433, -0.00006)
• Decimal: 29 digits
• Date/Time: yyyy-MM-dd, MM/dd/yy, hh:mm, hh:mm:ss
nafiseh.navabpour@ufz.de SPP2089.ufz.de 20
30. The Workflow
• Registration
• Login
• Creating a data structure
• Downloading an Excel template
• Working with the Excel template
• Creating a Dataset
• Providing metadata
• Uploading the dataset
• Seeing a Dataset
• Searching
nafiseh.navabpour@ufz.de SPP2089.ufz.de 30
32. The Workflow
• Registration
• Login
• Creating a data structure
• Downloading an Excel template
• Working with the Excel template
• Creating a Dataset
• Providing metadata
• Uploading the dataset
• Seeing a Dataset
• Searching
nafiseh.navabpour@ufz.de SPP2089.ufz.de 32
33. Working with the Excel template
nafiseh.navabpour@ufz.de SPP2089.ufz.de 33
34. Working with the Excel template
nafiseh.navabpour@ufz.de SPP2089.ufz.de 34
35. The Workflow
• Registration
• Login
• Creating a data structure
• Downloading an Excel template
• Working with the Excel template
• Creating a Dataset
• Providing metadata
• Uploading the dataset
• Seeing a Dataset
• Searching
nafiseh.navabpour@ufz.de SPP2089.ufz.de 35
38. The Workflow
• Registration
• Login
• Creating a data structure
• Downloading an Excel template
• Working with the Excel template
• Creating a Dataset
• Providing metadata
• Uploading the dataset
• Seeing a Dataset
• Searching
nafiseh.navabpour@ufz.de SPP2089.ufz.de 38
44. The Workflow
• Registration
• Login
• Creating a data structure
• Downloading an Excel template
• Working with the Excel template
• Creating a Dataset
• Providing metadata
• Uploading the dataset
• Seeing a Dataset
• Searching
nafiseh.navabpour@ufz.de SPP2089.ufz.de 44
52. The Workflow
• Registration
• Login
• Creating a data structure
• Downloading an Excel template
• Working with the Excel template
• Creating a Dataset
• Providing metadata
• Uploading the dataset
• Seeing a Dataset
• Searching
nafiseh.navabpour@ufz.de SPP2089.ufz.de 52
59. The Workflow
• Registration
• Login
• Creating a data structure
• Downloading an Excel template
• Working with the Excel template
• Creating a Dataset
• Providing metadata
• Uploading the dataset
• Seeing a Dataset
• Searching
nafiseh.navabpour@ufz.de SPP2089.ufz.de 59
72. Access to a dataset
nafiseh.navabpour@ufz.de SPP2089.ufz.de 72
73. Access to a dataset
nafiseh.navabpour@ufz.de SPP2089.ufz.de 73
74. Access to a dataset
Send a request to see a dataset
nafiseh.navabpour@ufz.de SPP2089.ufz.de 74
75. Access to a dataset
Make a decision
nafiseh.navabpour@ufz.de
Note: It is a one-time decision!
SPP2089.ufz.de 75
76. Access to a dataset
rBExIS
nafiseh.navabpour@ufz.de SPP2089.ufz.de 76
77. Access to a dataset
rBExIS
• Exchange data between R and BExIS2
• Get single and multiple datasets
• Upload new or update existing datasets
• Install devtools
• install.packages("devtools")
• Install rBExIS package from github
• library(devtools)
• install_github("cpfaff/rBExIS", subdir = "rBExIS")
nafiseh.navabpour@ufz.de SPP2089.ufz.de 77
78. Access to a dataset
rBExIS - Web API Data Access
Sample REST API calls: Data
• http://spp2089.ufz.de/api/data/6
– /api/data/6?header=id,name
– /api/data/6?filter=(Grade>50 AND Grade <90)
– /api/data/6?header=id,name&filter=(Grade>50)
Sample REST API calls: Metadata
• http://spp2089.ufz.de/api/metadata/6
nafiseh.navabpour@ufz.de SPP2089.ufz.de 78
79. Access to a dataset
Download Dataset
nafiseh.navabpour@ufz.de SPP2089.ufz.de 79
80. Access to a dataset
Download Dataset
nafiseh.navabpour@ufz.de SPP2089.ufz.de 80
81. Access to a dataset
Download Excel
nafiseh.navabpour@ufz.de SPP2089.ufz.de 81
82. Access to a dataset
Download Excel Template
nafiseh.navabpour@ufz.de SPP2089.ufz.de 82
83. Access to a dataset
Download Excel
nafiseh.navabpour@ufz.de SPP2089.ufz.de 83
84. Access to a dataset
Download Text
nafiseh.navabpour@ufz.de SPP2089.ufz.de 84
85. Access to a dataset
Download Text
nafiseh.navabpour@ufz.de SPP2089.ufz.de 85
101. Thank you for your attention!
nafiseh.navabpour@ufz.de SPP2089.ufz.de 101
Editor's Notes
BEXIS 2 is a software that needs individual instances, so ist NOT a portal
Designed for large collaborative projects as a central data management and exchange platform, best operated with a dedicated Data Manager
It is designed for active data, meaning, that data can evolve and be dynamic, the scope is the project-life-time, For long-term preservation data should be published to dedicated repositories, BEXIS 2 provides mechanisms/workfows for that
Focus on tabular data, since this is the main data type in the domains we work with, more details in a minute
Focus on data integration and re-use means: a) modelling the data structure, variables, units of each dataset and make those accessible to others, b) re-use of any information in the system to relief users from re-entering things
What do you think?
Could we upload this table exactly with this format? What is the problem?
You need transform your table to a form with only one line of header
PLEASE check at first the list of variables.
You can save the data structure and come back later to edit it.
Enable macros
Quality check
Open the box of the information about OWNER
Open the OWNER 1
Open the box of the information about PERSON