Physcial Science & Engineering - EUDAT

Physcial Science & Engineering - EUDAT

Physical Sciences & Engineering Chair: Johannes Reetz, MPCDF - Max Planck Society www.eudat.eu Rapporteur: Leon du Toit, University of Oslo EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065 Session 1: Data Pilot presentations 15:00 15:00 15:05 15:20 15:30 15:40 15:50 16:00 16:10 16:30 Physical Sciences and Engineering Parallel Track Chair: Johannes Reetz, MPCDF - MPS

Rapporteur: Leon du Toit, University of Oslo Introduction to track & objectives, Johannes Reetz, MPCDF - MPS NoMaD, Raphael Ritz, Max Planck Society SIMCODE-DS Data Pilot - Matteo Nori, Bologna University Tokamak data pilot - Alys Brett & David Muir, Culham Centre for Fusion Energy TURBASE-DNS Data Pilot - Fabio Bonaccorso, University of Rome Tor Vergata & INFN NFFA-EUROPE Data Pilot - Stefano Cozzini, CNR-IOM Direct simulation data of turbulent flows Data Pilot - Javier Jimenez & Alberto VelaMartin, U. Politcnica Madrid Discussion Networking Coffee Session 2: Data management challenges Discussion facilitator: Claudio Cacciari, CINECA Challenges: Data Repository vs. LT Data sharing Objective: discuss about common data (management, stewardship) challenges Expected Outcome: A series of insights to a variety of approaches and view points, between related communities A set of (new) common needs where EUDAT could play

DATA Domains PUBLISHED DATA DOMAIN Linking Linking Publications Publications To To Digital Digital Objects Objects Discovery Discovery of of Digital Digital Objects Objects REGISTERED DATA DOMAIN

Stage Stage Digital Digital Objects Objects Register Register Digital Digital Objects Objects EUDATDATA Domains PUBLISHED DATA DOMAIN Linking Linking Publications Publications To To

Digital Digital Objects Objects Discovery Discovery of of Digital Digital Objects Objects REGISTERED DATA DOMAIN Stage Stage Digital Digital Objects Objects Register Register Digital

Digital Objects Objects EUDATDATA Domains Discovery Discovery of of Digital Digital Objects Objects REGISTERED DATA DOMAIN Stage Stage Digital Digital Objects Objects Register

Register Digital Digital Objects Objects Data Objects Data Entities Live Data repository vs. Long Term data sharing A Data Repository for live data Data gets updated during its life cycle Metadata and provenance information gets updated Collections get extended Research collaborations need shared data access to live unregistred data. e.g. a Dropbox variant , is this enough? An archiving-system for LT data sharing static data Curation, data publication, certification Cant we have a single system for all such types of data? What is needed, what can be managed, what can be afforded?

Live Data repository vs. Long Term data sharing Sharing & LT preservation We are looking for ideas, sharing ideas, finding ideas Sharing raw data Publication of data -> data becomes valuable to other communities after it has been published. The published paper is metadata, when people start reusing it they collect more metadata that is not available in the paper from the author this should be fed back into the metadata store, risk having too much Discoverability is not such a big problem within small communities are informed about their own activities, across communities is the where the problems is Who takes the costs of storage and curation if large data sets are being long-term archived Curation (selection vs management) is difficult in the sense that it is censorship (selectivity); who is entitled to do this, custodian role: knowledgeable contact person - scalability concerns Finer grained definition of custodian role - stages of responsibility, issues of LT, knowledge transfer We should rely on AI and machine learning - agents to help scale this Problem solving for now data depositors should specify LT storage parameters policy, e.g. lifetime, setting the starting point Data protection, privacy and sensitive information - different legal requirements, respond to changing demands on data creators, e.g. legitimate reasons for processing, qualified open access, managing consent implies system design decisions, e.g. PII data Our systems should accomodate versioning, data corrections, provenance Often data do not speak for themselves, only become useful when combined with code; this brings software maintenance; executables; how does this relate to EUDAT; where are the lines drawn

Funders should address who pays for the custodians Related to the data+code combo - sometimes data capture methods necessitate software to reconstruct the data in order to make it analysable Client side software is always relevant - therefore, software maintenance is always present We need to store sufficient information in order to interpret the data; define different levels; collections should contain pointers to software or other necessary tools Capturing workflow (provenance) requires the execution tools to be capable of generating the relavant metadata Should align incentives so scientists have reasons to provide info needed for useful LT preservation Mitigate risk of knowledge loss by gathering as much metadata as possible; Consider the interrelatedness of practices vs technologies Live Data repository vs. Long Term data sharing Live Repository (workspace) We should rely on the user /communities to control the community-specific data management Domain and problem specificity leads to very heterogeneous data making a common live data repository difficult to deal with as a service provider Usage policy trying to reduce dimensionality is a goal for them (defining metadata is part of this effort); tools that can help with this would be useful How deal with data on ingest needs to be post processed? Want to access live data via APIs people mostly want the latest version; discourage people to download; so nobody uses the data files

_really_ large scale is out of scope, we are in the mid scale of data The service provider should be clear about current capabilities and future plans regarding the scale of data one could present structured data in ways without knowing too much about the domain; viz services. Q: Are these tools already available? Live Data repository vs. Long Term data sharing use APIs to abstract and get rid of heterogeneity metadata enables solutions here; communities need to provide metadata standards if they want automated solutions; the amount of useful automation is proportional to the quality and standardisation of the metadata need community model for metadata and interfaces, communities need help to develop standards RDA can help guide development of standards problem is that understanding between large communities to standardise metadata takes a _very_ long time; always evolving, several versions, no silver bullet; domain specific metadata standards also important; need to give knowledge to the researchers; we should also use existing ones; the creators should make tools to make this easier usage metadata - tracking users having agreed TOUs - support for this? provide examples of TOUs? manage access via TOUs? are metadata schemas in the registered domain fixed?

in the past people did take a lot of care with data and metadata but it came to nothing; we need to have high requirements for long term preservation for it to be useful Session 3: Physical Sciences & Engineering Live Data repository vs. Long Term data sharing Results and conclusion Physical Sciences & Engineering Live Data repository vs. Long Term data sharing Long-term preservation aspects LT preservation for sharing ideas, finding ideas, preserving ideas Sharing raw data Upon reusing LT data more metadata collected from the author. Fed back into the metadata store risk having an inflation of metadata and annotations Curation is difficult: risk of censorship (selectivity), who is entitled for this custodian role? Custodian role needs to be defined in detail. Scalability? We could increasingly rely on AI and machine learning techniques; intelligent agents can perhaps help to reduce the deluge of data and meta data prior to the preservation. Data depositors should specify LT preservation parameters (intentions at ingest time) policy, retention time, setting the starting point Handling sensitive information, necessary to log the data providers consent to use their data; cope

with the variety of legal requirements; this implies system design decisions Systems should accomodate versioning, data corrections, provenance LT preserved data remains useful only when linked to the preserved code Collections should contain pointers to software, execution environments and workflows Capturing workflows (provenance) requires the (workflow) systems to be capable to generate the relevant metadata Live Repository We should rely on the user /communities to control the community-specific data management Want to access live data via APIs need community model for metadata and interfaces, communities need help to develop standards RDA can help guide development of standards

Recently Viewed Presentations

  • User Input and Interactions on Microsoft Research ESL

    User Input and Interactions on Microsoft Research ESL

    Compare ratios of good and bad flags. Evaluation Categories. Evaluation. SubEval. Description. Good. Correct Flag. The correction fixes a problem in the user input. Neutral. Both Good. ... Neutral Flags not accepted but sentence edited to produce no flag.
  • In-Home Behavioral Supports AN INTRODUCTION TO SERVICES SUPPORTING

    In-Home Behavioral Supports AN INTRODUCTION TO SERVICES SUPPORTING

    In-Home Behavioral Supports. An introduction to Services supporting Caregivers & Resource Parents designed to increase capacity and support to families and improve outcomes for children and youth in our child welfare system.
  • Topographic Maps - East Aurora

    Topographic Maps - East Aurora

    Earth Science Notes A. Topographic Maps Shows the elevation of the land y using contour lines, and shows other natural and man-made features. S B. Contour line Lines of equal elevation. Elevation - distance above sea level. Contour Interval -...
  • Radio Media Products - Media Studies

    Radio Media Products - Media Studies

    The BBC tried to copy the style of the pirate stations. Why? Why was their style so different to traditional radio presenters? Why did the BBC not quite get it right? Lesson 2: Radio 1 line-up. The BBC had never...
  • AP Chemistry Reactions in Solution solution: a homogeneous

    AP Chemistry Reactions in Solution solution: a homogeneous

    strong electrolyte, weak electrolyte, or . nonelectrolyte. HClO. 3 C. 6 H 12 O. 6 HClO. strong. strong. non-weak. If 0.40 mol of each of the following are dissolved in. 2.5 L of water, rank them from least to greatest....
  • Seize the Opportunity - NHS Employers

    Seize the Opportunity - NHS Employers

    Seize the Opportunity NQB Safer Staffing Event - 25th March 2014 Mike Wright - Executive Director of Nursing & Patient Experience Content National Context: Key RCN documents Reviewing staffing levels in face of serious supply and demand concerns NQB -...
  • Field Sedimentology, Facies and Environments

    Field Sedimentology, Facies and Environments

    FIELD SEDIMENTOLOGY. A large part of modern sedimentology is the interpretation of sediments and sedimentary rocks in terms of processes of transport and deposition and how they are distributed in space and time in sedimentary environments.
  • Physics 123 "Majors" Section Unit 1

    Physics 123 "Majors" Section Unit 1

    Extension to N layers (or N+2, depending on how you want to count) Example 1 Example 2 Dielectric Mirrors/Filters Dielectric Mirrors/Filters Reading Quiz To get high reflectivity in a multilayer stack, each layer should have thickness: l l/2 l/4 l/6...