SlideShare a Scribd company logo
1 of 30
Download to read offline
Social Informatics Data
             Grid
Cyberinfrastructure for Collaborative Research in the
      Neural, Social and Behavioral Sciences

                Bennett I. Bertenthal
                 Indiana University
               bbertent@indiana.edu
Infrastructure for Social and
              Behavioral Sciences
Goal:
  Compare, measure and search for patterns in structured, semi-
  structured, and heterogeneous data sets.


Challenge:
  Integrate information over time, place, and types of data


Needs:
  (1) Data interface (shared datasets & databases)
  (2) Service interface (shared tools for analysis)
  (3) Intellectual interface (shared problems & theories)
Primary Objectives

• Develop prototype of core facility for collecting multiple
  measures of time-synchronized data

• Develop integrated tools for storage, retrieval,
  annotation, and analyses of multiple data sets at
  different time scales

• Develop scripts for parallelizing code to run on grid
  clusters
What is SIDGrid?
Social Informatics Data Grid
             • A general purpose architecture
               for streaming data applications
               (e.g., video, audio, time series)
             • Built on well established
               database, multimedia and web
               and grid services standards
             • Time alignment in distributed
               heterogeneous datasets
                – Software and hardware based
                – Integrated with existing laboratory
                  time stamping and registration
                  techniques
             • Scalable
                – Number of datasets
                – Types of data
                – Multiple end user applications
Server




Client
Client Side
Client Side
• Leveraging efforts for annotation and analysis of multimodal data
   – Familiarity and Interoperability
        • Elan (Max Planck Institute for Psycholinguistics, The Netherlands)
        • Talkbank (Carnegie Mellon University, US)
        • Digital Replay System (Nottingham University, UK)
   – XML, Java
   – Cross platform interoperability
• Adding SIDGrid functionality to Elan
   – Minimally intrusive
        • Avoid complicated co-development w/ELAN team
    – Browsing SIDGrid data
    – Additional data types
    – Upload / Download to SIDGrid server
66 GB 5 mov 2 wav …
368 GB 23 mov 6 wav …
 5 GB 1 mov 0 wav …
21 GB 3 mov 12 wav …
 4 GB 9 mov 1 wav …
 4 GB 4 mov 1 wav …
 1 GB 0 mov 2 wav …
945 GB 1 mov 66 wav …
 8 GB 3 mov 0 wav …
20 GB 13 mov 2 wav …
Server Side
.mov    .wav        .eaf         GB

10      0      0           45

 4     30      0           20

 2     2       1           3

12     100         9       200

 1     1       1           1

 6     2       0           12

400     0       1          1001

 0     666      1          312

 0     0       13          0.1

 0     0       0           0.0

18      4      0           66
Search and Query
                   (4,000 projects)
• Data Files
  –   Names
  –   Keywords
  –   Attributes (keyword-value)
  –   Date
  –   Type (Elan, Chat)
• Contents of Files
  – Metadata
  – Tier
  – Annotations
Server Side
• Web services
  – Query
  – Data download / upload

• Portal interface
   – Security
   – Data and metadata browsing
   – Preview
   – Tags, attributes
   – Projects
   – Groups
   – Search
   – Data transformation using grid resources
Science Gateway
What Is The TeraGrid?
                                    (circa 2006)
 75 Teraflops (trillion calculations per second)            • 16 Supercomputers - 9 different types, multiple sizes
 = 12,500 faster than all 6 billion humans on
 earth each doing one calculation per second
                                                            • World’s fastest network
                                                            • Globus Toolkit and other middleware providing single
                                                      ANL
                                                              login, application management, data movement, web
30 Gigabits per second to large sites                         services
= 20-30 times major university connections
= 30,000 times my home broadband
= 1 full length feature film per second

                         LA                                       Starlight                   Atlanta




                  SDSC                        TACC   NCSA    PU           IU   PSC                  ORNL
Scripts for Running Jobs on Grid

• Matlab (high-level language and interactive environment for peforming
   computationally intensive tasks)

• R (software environment for statistical computing and graphics)
• Praat (software for acoustic analysis)
• Free Surfer (automated tools for reconstruction of the brain’s cortical
   surface from structural MRI data)

• AFNI (programs for processing, analyzing, and displaying FMRI data)
• SUMA (adds cortical surface based functional imaging analysis to the
   AFNI suite of programs)
Advantages of Grid Computing

• Vastly expanded computing and storage
• Reduced effort as needs scale up
• Improved resource utilization; lower costs
• Facilities and models for collaboration
• Sharing of tools, data, and procedures and
  protocols
• Recording, assessment and reuse of complex
  tasks
Lessons Learned

• Fast prototyping vs production quality software
   – After one year of development, no product available for user
     feedback
   – Optimal design vs practical design
• Public vs private website
   – Need for dissemination
   – Need for security and protection of user groups and data
• Tools for diverse user groups with varying degrees of
  technical expertise
   – Non-intuitive interface with minimal user support
       • Importance of user manuals, technical support, and FAQs
• Multiple levels of privacy and confidentiality dictated by
  type of data and informed consent
If you build it, will they come?

• Dissemination of SIDGrid
   – Website and movie
   – Invited workshops at UofC and IU
   – Pre-conference workshops
• Start-up is time consuming
   – Scale of most projects conducted by social scientists does not
     justify time to learn web services and tools
   – Added value for larger, collaborative projects requires shift in
     goals and organization of research
• Resistance to data sharing
   – Original proposal required that all data stored on SIDGrid
     servers would be publicly available
Objections to Data Sharing

•   It’s my data!
•   Protection of confidentiality and anonymity
•   Need to first establish standards for coding and analysis
•   Reporting of misleading and confusing findings
•   Raw data but not coded data should be shared
    – Annotation and coding is very time consuming and should not
      become available to others
• If availability of web and software tools were contingent
  on sharing data, most users would opt out
Questions

More Related Content

What's hot

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataHaluan Irsad
 
Introduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsIntroduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsNguyen Cao
 
Hadoop and big data
Hadoop and big dataHadoop and big data
Hadoop and big dataYukti Kaura
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemMd. Hasan Basri (Angel)
 
Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Imviplav
 
Bio bigdata
Bio bigdata Bio bigdata
Bio bigdata Mk Kim
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabatinabati
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applicatio...
Dev Lakhani, Data Scientist at Batch Insights  "Real Time Big Data Applicatio...Dev Lakhani, Data Scientist at Batch Insights  "Real Time Big Data Applicatio...
Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applicatio...Dataconomy Media
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation17aroumougamh
 
Hadoop core concepts
Hadoop core conceptsHadoop core concepts
Hadoop core conceptsMaryan Faryna
 
Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopAvkash Chauhan
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An OverviewArvind Kalyan
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureRoman Nikitchenko
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An OverviewC. Scyphers
 
Getting Started with Big Data in the Cloud
Getting Started with Big Data in the CloudGetting Started with Big Data in the Cloud
Getting Started with Big Data in the CloudRightScale
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data HadoopApache Apex
 
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...Ashok Royal
 
Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12mark madsen
 

What's hot (20)

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Introduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsIntroduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & Applications
 
Hadoop and big data
Hadoop and big dataHadoop and big data
Hadoop and big data
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2
 
Bio bigdata
Bio bigdata Bio bigdata
Bio bigdata
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applicatio...
Dev Lakhani, Data Scientist at Batch Insights  "Real Time Big Data Applicatio...Dev Lakhani, Data Scientist at Batch Insights  "Real Time Big Data Applicatio...
Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applicatio...
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
 
Hadoop core concepts
Hadoop core conceptsHadoop core concepts
Hadoop core concepts
 
Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache Hadoop
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An Overview
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructure
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 
Getting Started with Big Data in the Cloud
Getting Started with Big Data in the CloudGetting Started with Big Data in the Cloud
Getting Started with Big Data in the Cloud
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data Hadoop
 
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
 
Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12
 

Viewers also liked

Hoffman nsf presentation hoffman-25-aug11.ppt
Hoffman nsf presentation hoffman-25-aug11.pptHoffman nsf presentation hoffman-25-aug11.ppt
Hoffman nsf presentation hoffman-25-aug11.pptJesse Lingeman
 
Supporting Emergence: Interaction Design for Visual Analytics Approach to ESDA
Supporting Emergence: Interaction Design for Visual Analytics Approach to ESDASupporting Emergence: Interaction Design for Visual Analytics Approach to ESDA
Supporting Emergence: Interaction Design for Visual Analytics Approach to ESDAJesse Lingeman
 
Flyer I Maintain
Flyer I MaintainFlyer I Maintain
Flyer I Maintainkikinelson
 
Children's Multicultural Library Learning Event
Children's Multicultural Library Learning EventChildren's Multicultural Library Learning Event
Children's Multicultural Library Learning EventLaurieRogers
 
Its About Time: Analyzing Temporal MicroLevel Behavioral Patterns
Its About Time: Analyzing Temporal MicroLevel Behavioral PatternsIts About Time: Analyzing Temporal MicroLevel Behavioral Patterns
Its About Time: Analyzing Temporal MicroLevel Behavioral PatternsJesse Lingeman
 

Viewers also liked (8)

Hoffman nsf presentation hoffman-25-aug11.ppt
Hoffman nsf presentation hoffman-25-aug11.pptHoffman nsf presentation hoffman-25-aug11.ppt
Hoffman nsf presentation hoffman-25-aug11.ppt
 
Aslin.discussion
Aslin.discussionAslin.discussion
Aslin.discussion
 
Galloway
GallowayGalloway
Galloway
 
Supporting Emergence: Interaction Design for Visual Analytics Approach to ESDA
Supporting Emergence: Interaction Design for Visual Analytics Approach to ESDASupporting Emergence: Interaction Design for Visual Analytics Approach to ESDA
Supporting Emergence: Interaction Design for Visual Analytics Approach to ESDA
 
Flyer I Maintain
Flyer I MaintainFlyer I Maintain
Flyer I Maintain
 
Photo album 2011
Photo album 2011Photo album 2011
Photo album 2011
 
Children's Multicultural Library Learning Event
Children's Multicultural Library Learning EventChildren's Multicultural Library Learning Event
Children's Multicultural Library Learning Event
 
Its About Time: Analyzing Temporal MicroLevel Behavioral Patterns
Its About Time: Analyzing Temporal MicroLevel Behavioral PatternsIts About Time: Analyzing Temporal MicroLevel Behavioral Patterns
Its About Time: Analyzing Temporal MicroLevel Behavioral Patterns
 

Similar to Bertenthal

Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
 
Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017Dr. Anita Goel
 
ACES QuakeSim 2011
ACES QuakeSim 2011ACES QuakeSim 2011
ACES QuakeSim 2011marpierc
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop IntroductionJayant Mukherjee
 
Big data analytics and machine intelligence v5.0
Big data analytics and machine intelligence   v5.0Big data analytics and machine intelligence   v5.0
Big data analytics and machine intelligence v5.0Amr Kamel Deklel
 
Data Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtcData Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtcDataTactics
 
The elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudThe elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudKhazret Sapenov
 
Network Engineering for High Speed Data Sharing
Network Engineering for High Speed Data SharingNetwork Engineering for High Speed Data Sharing
Network Engineering for High Speed Data SharingGlobus
 
Rack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC SupercomputerRack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC SupercomputerRebekah Rodriguez
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapSrinath Perera
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist SoftServe
 
Adoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific ResearchAdoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific ResearchYehia El-khatib
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
Cloud - NDT - Presentation
Cloud - NDT - PresentationCloud - NDT - Presentation
Cloud - NDT - PresentationÉric Dusablon
 

Similar to Bertenthal (20)

Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017
 
Big data
Big dataBig data
Big data
 
ACES QuakeSim 2011
ACES QuakeSim 2011ACES QuakeSim 2011
ACES QuakeSim 2011
 
Internet of Things
Internet of ThingsInternet of Things
Internet of Things
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Big data analytics and machine intelligence v5.0
Big data analytics and machine intelligence   v5.0Big data analytics and machine intelligence   v5.0
Big data analytics and machine intelligence v5.0
 
Lecture1
Lecture1Lecture1
Lecture1
 
Data Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtcData Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtc
 
The elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudThe elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloud
 
Network Engineering for High Speed Data Sharing
Network Engineering for High Speed Data SharingNetwork Engineering for High Speed Data Sharing
Network Engineering for High Speed Data Sharing
 
Rack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC SupercomputerRack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC Supercomputer
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Big Data
Big Data Big Data
Big Data
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
Adoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific ResearchAdoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific Research
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
Cloud - NDT - Presentation
Cloud - NDT - PresentationCloud - NDT - Presentation
Cloud - NDT - Presentation
 

More from Jesse Lingeman

More from Jesse Lingeman (9)

Messinger.openshapa.091511
Messinger.openshapa.091511Messinger.openshapa.091511
Messinger.openshapa.091511
 
Mac whinney macw
Mac whinney macwMac whinney macw
Mac whinney macw
 
Gray 110916 ns-fwkshp
Gray 110916 ns-fwkshpGray 110916 ns-fwkshp
Gray 110916 ns-fwkshp
 
Davis kean.open shapa
Davis kean.open shapaDavis kean.open shapa
Davis kean.open shapa
 
Borner links
Borner linksBorner links
Borner links
 
Altman links
Altman linksAltman links
Altman links
 
Alibali mult data streams a
Alibali mult data streams aAlibali mult data streams a
Alibali mult data streams a
 
Test1
Test1Test1
Test1
 
Test2
Test2Test2
Test2
 

Recently uploaded

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Bertenthal

  • 1. Social Informatics Data Grid Cyberinfrastructure for Collaborative Research in the Neural, Social and Behavioral Sciences Bennett I. Bertenthal Indiana University bbertent@indiana.edu
  • 2. Infrastructure for Social and Behavioral Sciences Goal: Compare, measure and search for patterns in structured, semi- structured, and heterogeneous data sets. Challenge: Integrate information over time, place, and types of data Needs: (1) Data interface (shared datasets & databases) (2) Service interface (shared tools for analysis) (3) Intellectual interface (shared problems & theories)
  • 3. Primary Objectives • Develop prototype of core facility for collecting multiple measures of time-synchronized data • Develop integrated tools for storage, retrieval, annotation, and analyses of multiple data sets at different time scales • Develop scripts for parallelizing code to run on grid clusters
  • 5.
  • 6. Social Informatics Data Grid • A general purpose architecture for streaming data applications (e.g., video, audio, time series) • Built on well established database, multimedia and web and grid services standards • Time alignment in distributed heterogeneous datasets – Software and hardware based – Integrated with existing laboratory time stamping and registration techniques • Scalable – Number of datasets – Types of data – Multiple end user applications
  • 9. Client Side • Leveraging efforts for annotation and analysis of multimodal data – Familiarity and Interoperability • Elan (Max Planck Institute for Psycholinguistics, The Netherlands) • Talkbank (Carnegie Mellon University, US) • Digital Replay System (Nottingham University, UK) – XML, Java – Cross platform interoperability • Adding SIDGrid functionality to Elan – Minimally intrusive • Avoid complicated co-development w/ELAN team – Browsing SIDGrid data – Additional data types – Upload / Download to SIDGrid server
  • 10.
  • 11. 66 GB 5 mov 2 wav … 368 GB 23 mov 6 wav … 5 GB 1 mov 0 wav … 21 GB 3 mov 12 wav … 4 GB 9 mov 1 wav … 4 GB 4 mov 1 wav … 1 GB 0 mov 2 wav … 945 GB 1 mov 66 wav … 8 GB 3 mov 0 wav … 20 GB 13 mov 2 wav …
  • 13. .mov .wav .eaf GB 10 0 0 45 4 30 0 20 2 2 1 3 12 100 9 200 1 1 1 1 6 2 0 12 400 0 1 1001 0 666 1 312 0 0 13 0.1 0 0 0 0.0 18 4 0 66
  • 14.
  • 15.
  • 16. Search and Query (4,000 projects) • Data Files – Names – Keywords – Attributes (keyword-value) – Date – Type (Elan, Chat) • Contents of Files – Metadata – Tier – Annotations
  • 17. Server Side • Web services – Query – Data download / upload • Portal interface – Security – Data and metadata browsing – Preview – Tags, attributes – Projects – Groups – Search – Data transformation using grid resources
  • 18.
  • 20.
  • 21. What Is The TeraGrid? (circa 2006) 75 Teraflops (trillion calculations per second) • 16 Supercomputers - 9 different types, multiple sizes = 12,500 faster than all 6 billion humans on earth each doing one calculation per second • World’s fastest network • Globus Toolkit and other middleware providing single ANL login, application management, data movement, web 30 Gigabits per second to large sites services = 20-30 times major university connections = 30,000 times my home broadband = 1 full length feature film per second LA Starlight Atlanta SDSC TACC NCSA PU IU PSC ORNL
  • 22.
  • 23.
  • 24. Scripts for Running Jobs on Grid • Matlab (high-level language and interactive environment for peforming computationally intensive tasks) • R (software environment for statistical computing and graphics) • Praat (software for acoustic analysis) • Free Surfer (automated tools for reconstruction of the brain’s cortical surface from structural MRI data) • AFNI (programs for processing, analyzing, and displaying FMRI data) • SUMA (adds cortical surface based functional imaging analysis to the AFNI suite of programs)
  • 25. Advantages of Grid Computing • Vastly expanded computing and storage • Reduced effort as needs scale up • Improved resource utilization; lower costs • Facilities and models for collaboration • Sharing of tools, data, and procedures and protocols • Recording, assessment and reuse of complex tasks
  • 26.
  • 27. Lessons Learned • Fast prototyping vs production quality software – After one year of development, no product available for user feedback – Optimal design vs practical design • Public vs private website – Need for dissemination – Need for security and protection of user groups and data • Tools for diverse user groups with varying degrees of technical expertise – Non-intuitive interface with minimal user support • Importance of user manuals, technical support, and FAQs • Multiple levels of privacy and confidentiality dictated by type of data and informed consent
  • 28. If you build it, will they come? • Dissemination of SIDGrid – Website and movie – Invited workshops at UofC and IU – Pre-conference workshops • Start-up is time consuming – Scale of most projects conducted by social scientists does not justify time to learn web services and tools – Added value for larger, collaborative projects requires shift in goals and organization of research • Resistance to data sharing – Original proposal required that all data stored on SIDGrid servers would be publicly available
  • 29. Objections to Data Sharing • It’s my data! • Protection of confidentiality and anonymity • Need to first establish standards for coding and analysis • Reporting of misleading and confusing findings • Raw data but not coded data should be shared – Annotation and coding is very time consuming and should not become available to others • If availability of web and software tools were contingent on sharing data, most users would opt out