Today’s world is awash in data, and organizations are rapidly discovering that putting this data to work is the single most important factor in their ability to remain relevant to hyper-connected consumers. In this session, HP will explore the new trends of this appified, thingified, context-rich world and how HP’s Haven platform can give you an edge over your competition.
This introduction to graph databases is specifically designed for Enterprise Architects who need to map business requirements to architectural components like graph databases. It explains how and why graphs matter for Enterprise Architecture and reviews the architectural differences between relational and graph models.
Big Data & Analytics continues to redefine business. Data has transitioned from an underused asset to the lifeblood of the organisation, and a critical component of business intelligence, insight and strategy.
Big Data Scotland is the largest annual data analytics conference held in Scotland: it is supported by ScotlandIS and The Data Lab and free for delegates to attend. The conference is geared towards senior technologists and business leaders and aims to provide a unique forum for knowledge exchange, discussion and cross-pollination.
The programme will explore the evolution of data analytics; looking at key tools and techniques and how these can be applied to deliver practical insight and value. Presentations will span a wide array of topics from Data Wrangling and Visualisation to AI, Chatbots and Industry 4.0.
Key Topics
• Tools and techniques
• Corporate data culture, business processes, digital transformation
• Business intelligence, trends, decision making
• AI, Real-time Analytics, IoT, Industry 4.0, Robotics
• Security, regulation, privacy, consent, anonymization
• Data visualisation, interpretation and communication
• CRM and Personalisation
How to Become an Analytics Ready Insurer - with Informatica and HortonworksHortonworks
Whether you are an insurer, reinsurer, broker or insurance service provider; everything you do is based on analytics. From underwriting to claims to agency and marketing, the smartest and most streamlined business operations at insurance companies are driven by advanced and intelligent analytics. But is your data ready? Are you an “Analytics Ready” insurer? Great analytics starts with great data management. Join us as industry experts from Informatica and Hortonworks share industry trends and best practices to show you how to become an “Analytics Ready” insurer.
The Scout24 Data Platform (A Technical Deep Dive)RaffaelDzikowski
The Scout24 Data Platform powers all reporting, ad hoc analytics and machine learning products at AutoScout24 and ImmobilienScout24. In this talk, we will take a technical deep dive into our modern, cloud-based big data platform. We will discuss our evolution of approaches to ingestion, ETL, access control, reporting, and machine learning with a focus on in-the-trenches learnings gained from our many failures and successes as we migrated from a traditional Oracle Data Warehouse to an AWS-based data lake.
Hadoop 2.0: YARN to Further Optimize Data ProcessingHortonworks
Data is exponentially increasing in both types and volumes, creating opportunities for businesses. Watch this video and learn from three Big Data experts: John Kreisa, VP Strategic Marketing at Hortonworks, Imad Birouty, Director of Technical Product Marketing at Teradata and John Haddad, Senior Director of Product Marketing at Informatica.
Multiple systems are needed to exploit the variety and volume of data sources, including a flexible data repository. Learn more about:
- Apache Hadoop 2 and YARN
- Data Lakes
- Intelligent data management layers needed to manage metadata and usage patterns as well as track consumption across these data platforms.
This introduction to graph databases is specifically designed for Enterprise Architects who need to map business requirements to architectural components like graph databases. It explains how and why graphs matter for Enterprise Architecture and reviews the architectural differences between relational and graph models.
Big Data & Analytics continues to redefine business. Data has transitioned from an underused asset to the lifeblood of the organisation, and a critical component of business intelligence, insight and strategy.
Big Data Scotland is the largest annual data analytics conference held in Scotland: it is supported by ScotlandIS and The Data Lab and free for delegates to attend. The conference is geared towards senior technologists and business leaders and aims to provide a unique forum for knowledge exchange, discussion and cross-pollination.
The programme will explore the evolution of data analytics; looking at key tools and techniques and how these can be applied to deliver practical insight and value. Presentations will span a wide array of topics from Data Wrangling and Visualisation to AI, Chatbots and Industry 4.0.
Key Topics
• Tools and techniques
• Corporate data culture, business processes, digital transformation
• Business intelligence, trends, decision making
• AI, Real-time Analytics, IoT, Industry 4.0, Robotics
• Security, regulation, privacy, consent, anonymization
• Data visualisation, interpretation and communication
• CRM and Personalisation
How to Become an Analytics Ready Insurer - with Informatica and HortonworksHortonworks
Whether you are an insurer, reinsurer, broker or insurance service provider; everything you do is based on analytics. From underwriting to claims to agency and marketing, the smartest and most streamlined business operations at insurance companies are driven by advanced and intelligent analytics. But is your data ready? Are you an “Analytics Ready” insurer? Great analytics starts with great data management. Join us as industry experts from Informatica and Hortonworks share industry trends and best practices to show you how to become an “Analytics Ready” insurer.
The Scout24 Data Platform (A Technical Deep Dive)RaffaelDzikowski
The Scout24 Data Platform powers all reporting, ad hoc analytics and machine learning products at AutoScout24 and ImmobilienScout24. In this talk, we will take a technical deep dive into our modern, cloud-based big data platform. We will discuss our evolution of approaches to ingestion, ETL, access control, reporting, and machine learning with a focus on in-the-trenches learnings gained from our many failures and successes as we migrated from a traditional Oracle Data Warehouse to an AWS-based data lake.
Hadoop 2.0: YARN to Further Optimize Data ProcessingHortonworks
Data is exponentially increasing in both types and volumes, creating opportunities for businesses. Watch this video and learn from three Big Data experts: John Kreisa, VP Strategic Marketing at Hortonworks, Imad Birouty, Director of Technical Product Marketing at Teradata and John Haddad, Senior Director of Product Marketing at Informatica.
Multiple systems are needed to exploit the variety and volume of data sources, including a flexible data repository. Learn more about:
- Apache Hadoop 2 and YARN
- Data Lakes
- Intelligent data management layers needed to manage metadata and usage patterns as well as track consumption across these data platforms.
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
So, you finally have a data ecosystem with Kafka and Hadoop both deployed and operating correctly at scale. Congratulations. Are you done? Far from it.
As the birthplace of Kafka and an early adopter of Hadoop, LinkedIn has 13 years of combined experience using Kafka and Hadoop at scale to run a data-driven company. Both Kafka and Hadoop are flexible, scalable infrastructure pieces, but using these technologies without a clear idea of what the higher-level data ecosystem should be is perilous. Shirshanka Das and Yael Garten share best practices around data models and formats, choosing the right level of granularity of Kafka topics and Hadoop tables, and moving data efficiently and correctly between Kafka and Hadoop and explore a data abstraction layer, Dali, that can help you to process data seamlessly across Kafka and Hadoop.
Beyond pure technology, Shirshanka and Yael outline the three components of a great data culture and ecosystem and explain how to create maintainable data contracts between data producers and data consumers (like data scientists and data analysts) and how to standardize data effectively in a growing organization to enable (and not slow down) innovation and agility. They then look to the future, envisioning a world where you can successfully deploy a data abstraction of views on Hadoop data, like a data API as a protective and enabling shield. Along the way, Shirshanka and Yael discuss observations on how to enable teams to be good data citizens in producing, consuming, and owning datasets and offer an overview of LinkedIn’s governance model: the tools, process and teams that ensure that its data ecosystem can handle change and sustain #datasciencehappiness.
This is the deck that I used for the talks that I gave in the Silicon Valley / San Francisco bay area at various events in April and May 2016.
1. Introduces Big Data and related challenges.
2. Briefly covers some of the important open-source big data related technologies.
3. Introduces Hadoop
4. Introduces Spark Core, Spark SQL, MLlib and GraphX
Big Data International Keynote Speaker Mark van Rijmenam shared his vision on Hadoop Data Lakes during a Zaloni Webinar. What are the Hadoop Data Lake trends for 2016, what are the data lake challenges and how can organizations benefit from data lakes.
Strata 2015 presentation from Oracle for Big Data - we are announcing several new big data products including GoldenGate for Big Data, Big Data Discovery, Oracle Big Data SQL and Oracle NoSQL
Using data relationships to make connections between individual data records transforms the data you already have into something much more powerful. This webinar will explain how both young and established companies have adopted graph thinking - and how they’ve risen to dominate their fields.
Mr. Slim Baltagi is a Systems Architect at Hortonworks, with over 4 years of Hadoop experience working on 9 Big Data projects: Advanced Customer Analytics, Supply Chain Analytics, Medical Coverage Discovery, Payment Plan Recommender, Research Driven Call List for Sales, Prime Reporting Platform, Customer Hub, Telematics, Historical Data Platform; with Fortune 100 clients and global companies from Financial Services, Insurance, Healthcare and Retail.
Mr. Slim Baltagi has worked in various architecture, design, development and consulting roles at.
Accenture, CME Group, TransUnion, Syntel, Allstate, TransAmerica, Credit Suisse, Chicago Board Options Exchange, Federal Reserve Bank of Chicago, CNA, Sears, USG, ACNielsen, Deutshe Bahn.
Mr. Baltagi has also over 14 years of IT experience with an emphasis on full life cycle development of Enterprise Web applications using Java and Open-Source software. He holds a master’s degree in mathematics and is an ABD in computer science from Université Laval, Québec, Canada.
Languages: Java, Python, JRuby, JEE , PHP, SQL, HTML, XML, XSLT, XQuery, JavaScript, UML, JSON
Databases: Oracle, MS SQL Server, MYSQL, PostreSQL
Software: Eclipse, IBM RAD, JUnit, JMeter, YourKit, PVCS, CVS, UltraEdit, Toad, ClearCase, Maven, iText, Visio, Japser Reports, Alfresco, Yslow, Terracotta, Toad, SoapUI, Dozer, Sonar, Git
Frameworks: Spring, Struts, AppFuse, SiteMesh, Tiles, Hibernate, Axis, Selenium RC, DWR Ajax , Xstream
Distributed Computing/Big Data: Hadoop, MapReduce, HDFS, Hive, Pig, Sqoop, HBase, R, RHadoop, Cloudera CDH4, MapR M7, Hortonworks HDP 2.1
In this slidedeck, Infochimps Director of Product, Tim Gasper, discusses how Infochimps tackles business problems for customers by deploying a comprehensive Big Data infrastructure in days; sometimes in just hours. Tim unlocks how Infochimps is now taking that same aggressive approach to deliver faster time to value by helping customers develop analytic applications with impeccable speed.
Big Data, Hadoop, Hortonworks and Microsoft HDInsightHortonworks
Big Data is everywhere. And at the center of the big data discussion is Apache Hadoop, a next-generation enterprise data platform that allows you to capture, process and share the enormous amounts of new, multi-structured data that doesn’t fit into transitional systems.
With Microsoft HDInsight, powered by Hortonworks Data Platform, you can bridge this new world of unstructured content with the structured data we manage today. Together, we bring Hadoop to the masses as an addition to your current enterprise data architectures so that you can amass net new insight without net new headache.
It is almost impossible to escape the topic of Data Science. While the core of Data Science has remained the same over the last decade, it’s emergence to the forefront is spurred by both the availability of new data types and a true realization of the value that it delivers. In this session, we will provide an overview of data science, the different classes of machine learning algorithm and deliver an end-to-end demonstration of performing Machine Learning Using Hadoop. Audience: Developers, Data Scientist Architects and System Engineers.
Recording: https://hortonworks.webex.com/hortonworks/lsr.php?RCID=4175a7421d00257f33df146f50c41af8
Big Data Real Time Analytics - A Facebook Case StudyNati Shalom
Building Your Own Facebook Real Time Analytics System with Cassandra and GigaSpaces.
Facebook's real time analytics system is a good reference for those looking to build their real time analytics system for big data.
The first part covers the lessons from Facebook's experience and the reason they chose HBase over Cassandra.
In the second part of the session, we learn how we can build our own Real Time Analytics system, achieve better performance, gain real business insights, and business analytics on our big data, and make the deployment and scaling significantly simpler using the new version of Cassandra and GigaSpaces Cloudify.
KeyNote #DBInsights" on 7 April. My views on the DBAs fears, doubts and opportunities in the age of DevOps, Cloud, Big Data, Open Source, bi-modal IT, Pizza teams, you name it.
Lessons from building a stream-first metadata platform | Shirshanka Das, StealthHostedbyConfluent
"For data-driven enterprises, the most important objective is unlocking the value of their data. To enable this, data scientists are increasingly turning towards data discovery tools (also known as data catalogs) that can help them locate the right dataset or insight and use it correctly. But are all data catalogs the same?
In this talk, I describe how a stream-first architecture was a critical design element that benefited the implementation of our data catalog. We follow the evolution of LinkedIn DataHub’s architecture over the past few years from a simple search tool to a streaming metadata platform that drives productivity and governance workflows across the company.
Join this talk to learn:
* How different data discovery / catalog tools are architected and the tradeoffs in each kind of architecture
* How streaming architectures can benefit metadata
* How event-driven metadata architectures can supercharge your data productivity and governance workflows at your company"
NoSQL Simplified: Schema vs. Schema-lessInfiniteGraph
A look at the many facets of schema-less approaches vs a rich schema approach, ranging from performance and query support to heterogeneity and code/data migration issues. Presented by Leon Guzenda, Founder, Objectivity
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...Dataconomy Media
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr. Abdourahmane Faye, Big Data SME Lead DACH at HPE
Watch more from Data Natives Berlin 2016 here: http://bit.ly/2fE1sEo
Visit the conference website to learn more: www.datanatives.io
Follow Data Natives:
https://www.facebook.com/DataNatives
https://twitter.com/DataNativesConf
Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2017: http://bit.ly/1WMJAqS
About the Author:
Abdou Faye is Subject Matter Expert in Big Data, Predictive Analytics / Machine Learning and Business Intelligence, with more than 19 years of experience in that area in various leading and executive roles, both from a Technical, Architecture and Sales perspectives. He recently joins HPE coming from SAP, where he was leading the Predictive Analysis & Big Data CoE (Center Of Excellence) business since 2010 for DACH, CEE and CIS region, in charge of Business Development and Sales Support. Prior to SAP, he worked 4 Years at Microsoft as Senior BI & SQL-Server Consultant in Switzerland, after 10 years spent at Philip Morris (CH), Orange Telco (CH) and SEMA Group (FR). Abdou graduated from Paris 11 University in 2000, where he completed a PhD on Data Mining/Predictive Analytics, after completing a Master in Computer Science.
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...DataWorks Summit
Finance Data Lake objective is to create a centralized enterprise data repository for all Finance and Supply Chain data. It serves as the single source of truth. It enables a self-service discovery Analytics platform for business users to answer adhoc business questions and derive critical insights. The data lake is based on open source Hadoop big data platform and a very cost effective solution in breaking the ERP data silos and simplifying the data architecture in the enterprise.
POCs were conducted on in-house Hortonworks Hadoop data platform to validate the cluster performance for Production volumes. Based on business priorities, an initial roadmap was defined using 3 data sources including 2 SAP ERPs and Peoplesoft (OLTP systems). Development environment was established in AWS Cloud for agile delivery. The near real time data ingestion architecture for the data lake was defined using replication tools and custom SQOOP based micro-batching framework and data persisted in Apache Hive DB in ORC format. Data and user security is implemented using Apache Ranger and sensitive data stored at rest in encryption zones. Business data sets were developed in Hive scripts and scheduled using Oozie. Multiple reporting tools connectivity including SQL tools, Excel and Tableau were enabled for Self-service Analytics. Upon successful implementation of the initial phase, a full roadmap is established to extend the Finance data lake to over 25 data sources and enhance data ingestion to scale as well as enable OLAP tools on Hadoop.
Insurance companies of all sizes are challenged to keep up with emerging technologies that deliver a competitive advantage. Recording: https://www.brighttalk.com/webcast/9573/192877
Big data holds the key to greater customer insight and stronger customer relationships. But risk of sensitive data exposure — and compliance violations — keeps many insurers from pursuing big data initiatives and reaping the rewards of business-driven analytics. Join Dataguise and Hortonworks for this live webinar to learn how you can free your organization from traditional information security constraints and unlock the power of your most valuable business assets.
• What do you need to know about PII/PHI privacy before embarking on big data initiatives?
• Why do so many big data initiatives fail before they’ve even begun—and what can you do about it?
• How can IT security organizations help data scientists extract more business value from their data?
• How are leading insurance companies leveraging big data to gain competitive advantage?
Datomic – A Modern Database - StampedeCon 2014StampedeCon
At StampedeCon 2014, Alex Miller (Cognitect) presented "Datomic – A Modern Database."
Datomic is a distributed database designed to run on next-generation cloud architectures. Datomic stores facts and retractions using a flexible schema, consistent transactions, and a logic-based query language. The focus on facts over time gives you the ability to look at the state of the database at any point in time and traverse your transactional data in many ways.
We’ll take a tour of the Datomic data model, transactions, query language, and architecture to highlight some of the unique attributes of Datomic and why it is an ideal modern database.
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016StampedeCon
Have you ever wanted to analyze sensor data that arrives every second from across the world? Or maybe your want to analyze intra-day trading prices of millions of financial instruments? Or take all the page views from Wikipedia and compare the hourly statistics? To do this or any other similar analysis, you will need to analyze large sequences of measurements over time. And what better way to do this then with Apache Spark? In this session we will dig into how to consume data, and analyze it with Spark, and then store the results in Apache Cassandra.
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
So, you finally have a data ecosystem with Kafka and Hadoop both deployed and operating correctly at scale. Congratulations. Are you done? Far from it.
As the birthplace of Kafka and an early adopter of Hadoop, LinkedIn has 13 years of combined experience using Kafka and Hadoop at scale to run a data-driven company. Both Kafka and Hadoop are flexible, scalable infrastructure pieces, but using these technologies without a clear idea of what the higher-level data ecosystem should be is perilous. Shirshanka Das and Yael Garten share best practices around data models and formats, choosing the right level of granularity of Kafka topics and Hadoop tables, and moving data efficiently and correctly between Kafka and Hadoop and explore a data abstraction layer, Dali, that can help you to process data seamlessly across Kafka and Hadoop.
Beyond pure technology, Shirshanka and Yael outline the three components of a great data culture and ecosystem and explain how to create maintainable data contracts between data producers and data consumers (like data scientists and data analysts) and how to standardize data effectively in a growing organization to enable (and not slow down) innovation and agility. They then look to the future, envisioning a world where you can successfully deploy a data abstraction of views on Hadoop data, like a data API as a protective and enabling shield. Along the way, Shirshanka and Yael discuss observations on how to enable teams to be good data citizens in producing, consuming, and owning datasets and offer an overview of LinkedIn’s governance model: the tools, process and teams that ensure that its data ecosystem can handle change and sustain #datasciencehappiness.
This is the deck that I used for the talks that I gave in the Silicon Valley / San Francisco bay area at various events in April and May 2016.
1. Introduces Big Data and related challenges.
2. Briefly covers some of the important open-source big data related technologies.
3. Introduces Hadoop
4. Introduces Spark Core, Spark SQL, MLlib and GraphX
Big Data International Keynote Speaker Mark van Rijmenam shared his vision on Hadoop Data Lakes during a Zaloni Webinar. What are the Hadoop Data Lake trends for 2016, what are the data lake challenges and how can organizations benefit from data lakes.
Strata 2015 presentation from Oracle for Big Data - we are announcing several new big data products including GoldenGate for Big Data, Big Data Discovery, Oracle Big Data SQL and Oracle NoSQL
Using data relationships to make connections between individual data records transforms the data you already have into something much more powerful. This webinar will explain how both young and established companies have adopted graph thinking - and how they’ve risen to dominate their fields.
Mr. Slim Baltagi is a Systems Architect at Hortonworks, with over 4 years of Hadoop experience working on 9 Big Data projects: Advanced Customer Analytics, Supply Chain Analytics, Medical Coverage Discovery, Payment Plan Recommender, Research Driven Call List for Sales, Prime Reporting Platform, Customer Hub, Telematics, Historical Data Platform; with Fortune 100 clients and global companies from Financial Services, Insurance, Healthcare and Retail.
Mr. Slim Baltagi has worked in various architecture, design, development and consulting roles at.
Accenture, CME Group, TransUnion, Syntel, Allstate, TransAmerica, Credit Suisse, Chicago Board Options Exchange, Federal Reserve Bank of Chicago, CNA, Sears, USG, ACNielsen, Deutshe Bahn.
Mr. Baltagi has also over 14 years of IT experience with an emphasis on full life cycle development of Enterprise Web applications using Java and Open-Source software. He holds a master’s degree in mathematics and is an ABD in computer science from Université Laval, Québec, Canada.
Languages: Java, Python, JRuby, JEE , PHP, SQL, HTML, XML, XSLT, XQuery, JavaScript, UML, JSON
Databases: Oracle, MS SQL Server, MYSQL, PostreSQL
Software: Eclipse, IBM RAD, JUnit, JMeter, YourKit, PVCS, CVS, UltraEdit, Toad, ClearCase, Maven, iText, Visio, Japser Reports, Alfresco, Yslow, Terracotta, Toad, SoapUI, Dozer, Sonar, Git
Frameworks: Spring, Struts, AppFuse, SiteMesh, Tiles, Hibernate, Axis, Selenium RC, DWR Ajax , Xstream
Distributed Computing/Big Data: Hadoop, MapReduce, HDFS, Hive, Pig, Sqoop, HBase, R, RHadoop, Cloudera CDH4, MapR M7, Hortonworks HDP 2.1
In this slidedeck, Infochimps Director of Product, Tim Gasper, discusses how Infochimps tackles business problems for customers by deploying a comprehensive Big Data infrastructure in days; sometimes in just hours. Tim unlocks how Infochimps is now taking that same aggressive approach to deliver faster time to value by helping customers develop analytic applications with impeccable speed.
Big Data, Hadoop, Hortonworks and Microsoft HDInsightHortonworks
Big Data is everywhere. And at the center of the big data discussion is Apache Hadoop, a next-generation enterprise data platform that allows you to capture, process and share the enormous amounts of new, multi-structured data that doesn’t fit into transitional systems.
With Microsoft HDInsight, powered by Hortonworks Data Platform, you can bridge this new world of unstructured content with the structured data we manage today. Together, we bring Hadoop to the masses as an addition to your current enterprise data architectures so that you can amass net new insight without net new headache.
It is almost impossible to escape the topic of Data Science. While the core of Data Science has remained the same over the last decade, it’s emergence to the forefront is spurred by both the availability of new data types and a true realization of the value that it delivers. In this session, we will provide an overview of data science, the different classes of machine learning algorithm and deliver an end-to-end demonstration of performing Machine Learning Using Hadoop. Audience: Developers, Data Scientist Architects and System Engineers.
Recording: https://hortonworks.webex.com/hortonworks/lsr.php?RCID=4175a7421d00257f33df146f50c41af8
Big Data Real Time Analytics - A Facebook Case StudyNati Shalom
Building Your Own Facebook Real Time Analytics System with Cassandra and GigaSpaces.
Facebook's real time analytics system is a good reference for those looking to build their real time analytics system for big data.
The first part covers the lessons from Facebook's experience and the reason they chose HBase over Cassandra.
In the second part of the session, we learn how we can build our own Real Time Analytics system, achieve better performance, gain real business insights, and business analytics on our big data, and make the deployment and scaling significantly simpler using the new version of Cassandra and GigaSpaces Cloudify.
KeyNote #DBInsights" on 7 April. My views on the DBAs fears, doubts and opportunities in the age of DevOps, Cloud, Big Data, Open Source, bi-modal IT, Pizza teams, you name it.
Lessons from building a stream-first metadata platform | Shirshanka Das, StealthHostedbyConfluent
"For data-driven enterprises, the most important objective is unlocking the value of their data. To enable this, data scientists are increasingly turning towards data discovery tools (also known as data catalogs) that can help them locate the right dataset or insight and use it correctly. But are all data catalogs the same?
In this talk, I describe how a stream-first architecture was a critical design element that benefited the implementation of our data catalog. We follow the evolution of LinkedIn DataHub’s architecture over the past few years from a simple search tool to a streaming metadata platform that drives productivity and governance workflows across the company.
Join this talk to learn:
* How different data discovery / catalog tools are architected and the tradeoffs in each kind of architecture
* How streaming architectures can benefit metadata
* How event-driven metadata architectures can supercharge your data productivity and governance workflows at your company"
NoSQL Simplified: Schema vs. Schema-lessInfiniteGraph
A look at the many facets of schema-less approaches vs a rich schema approach, ranging from performance and query support to heterogeneity and code/data migration issues. Presented by Leon Guzenda, Founder, Objectivity
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...Dataconomy Media
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr. Abdourahmane Faye, Big Data SME Lead DACH at HPE
Watch more from Data Natives Berlin 2016 here: http://bit.ly/2fE1sEo
Visit the conference website to learn more: www.datanatives.io
Follow Data Natives:
https://www.facebook.com/DataNatives
https://twitter.com/DataNativesConf
Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2017: http://bit.ly/1WMJAqS
About the Author:
Abdou Faye is Subject Matter Expert in Big Data, Predictive Analytics / Machine Learning and Business Intelligence, with more than 19 years of experience in that area in various leading and executive roles, both from a Technical, Architecture and Sales perspectives. He recently joins HPE coming from SAP, where he was leading the Predictive Analysis & Big Data CoE (Center Of Excellence) business since 2010 for DACH, CEE and CIS region, in charge of Business Development and Sales Support. Prior to SAP, he worked 4 Years at Microsoft as Senior BI & SQL-Server Consultant in Switzerland, after 10 years spent at Philip Morris (CH), Orange Telco (CH) and SEMA Group (FR). Abdou graduated from Paris 11 University in 2000, where he completed a PhD on Data Mining/Predictive Analytics, after completing a Master in Computer Science.
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...DataWorks Summit
Finance Data Lake objective is to create a centralized enterprise data repository for all Finance and Supply Chain data. It serves as the single source of truth. It enables a self-service discovery Analytics platform for business users to answer adhoc business questions and derive critical insights. The data lake is based on open source Hadoop big data platform and a very cost effective solution in breaking the ERP data silos and simplifying the data architecture in the enterprise.
POCs were conducted on in-house Hortonworks Hadoop data platform to validate the cluster performance for Production volumes. Based on business priorities, an initial roadmap was defined using 3 data sources including 2 SAP ERPs and Peoplesoft (OLTP systems). Development environment was established in AWS Cloud for agile delivery. The near real time data ingestion architecture for the data lake was defined using replication tools and custom SQOOP based micro-batching framework and data persisted in Apache Hive DB in ORC format. Data and user security is implemented using Apache Ranger and sensitive data stored at rest in encryption zones. Business data sets were developed in Hive scripts and scheduled using Oozie. Multiple reporting tools connectivity including SQL tools, Excel and Tableau were enabled for Self-service Analytics. Upon successful implementation of the initial phase, a full roadmap is established to extend the Finance data lake to over 25 data sources and enhance data ingestion to scale as well as enable OLAP tools on Hadoop.
Insurance companies of all sizes are challenged to keep up with emerging technologies that deliver a competitive advantage. Recording: https://www.brighttalk.com/webcast/9573/192877
Big data holds the key to greater customer insight and stronger customer relationships. But risk of sensitive data exposure — and compliance violations — keeps many insurers from pursuing big data initiatives and reaping the rewards of business-driven analytics. Join Dataguise and Hortonworks for this live webinar to learn how you can free your organization from traditional information security constraints and unlock the power of your most valuable business assets.
• What do you need to know about PII/PHI privacy before embarking on big data initiatives?
• Why do so many big data initiatives fail before they’ve even begun—and what can you do about it?
• How can IT security organizations help data scientists extract more business value from their data?
• How are leading insurance companies leveraging big data to gain competitive advantage?
Datomic – A Modern Database - StampedeCon 2014StampedeCon
At StampedeCon 2014, Alex Miller (Cognitect) presented "Datomic – A Modern Database."
Datomic is a distributed database designed to run on next-generation cloud architectures. Datomic stores facts and retractions using a flexible schema, consistent transactions, and a logic-based query language. The focus on facts over time gives you the ability to look at the state of the database at any point in time and traverse your transactional data in many ways.
We’ll take a tour of the Datomic data model, transactions, query language, and architecture to highlight some of the unique attributes of Datomic and why it is an ideal modern database.
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016StampedeCon
Have you ever wanted to analyze sensor data that arrives every second from across the world? Or maybe your want to analyze intra-day trading prices of millions of financial instruments? Or take all the page views from Wikipedia and compare the hourly statistics? To do this or any other similar analysis, you will need to analyze large sequences of measurements over time. And what better way to do this then with Apache Spark? In this session we will dig into how to consume data, and analyze it with Spark, and then store the results in Apache Cassandra.
Making Machine Learning Work in Practice - StampedeCon 2014StampedeCon
At StampedeCon 2014, Kilian Q. Weinberger (Washington University) presented "Making Machine Learning work in Practice."
Here, Kilian will go over common pitfalls and tricks on how to make machine learning work.
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...StampedeCon
At StampedeCon 2014, Ronald Indeck (VelociData), "Enabling Key Business Advantage from Big Data through Advanced Ingest Processing."
All too often we see critical data dumped into a “Data Lake” causing the data waters to stagnate and become a “Data Swamp”. We have found that many data transformation, quality, and security processes can be addressed a priori on ingest to enhance goodness and improve accessibility to the data. Data can still be stored in raw form if desired but this processing on ingest can unlock operational effectiveness and competitive advantage by integrating fresh and historical data and enable the full potential of the data. We will discuss the underpinnings of stream processing engines, review several relevant business use cases, and discuss future applications.
Batch and Real-time EHR updates into Hadoop - StampedeCon 2015StampedeCon
At the StampedeCon 2015 Big Data Conference: Mercy has built a system using batch and streaming technology to allow batch and near real-time updates to flow from its Epic EHR (Electronic Health Records) system into our Hadoop cluster. Mercy is using this system to provide reporting and analytics capabilities to its researchers, business owners, and physicians. The system uses Sqoop, Flume, Pig, Hive, Oozie, and Storm. As part of this system, Mercy developed a generic database replication utility that makes it a one-line configuration change to add new tables and relational data sources.
This session will describe the batch and streaming engines that Mercy uses to integrate the patient data. Adam will talk about how the system came together and where it is headed in the future.
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...StampedeCon
The collection and use of Big Data has become an important part of modern business practice. The Internet of Things (IoT) movement promises to provide new opportunities for businesses interested in the intersection of people and technology. It is also wrought with pitfalls for practitioners and researchers who struggle to make sense of an increasing cacophony of signals. How should they poll and collect data from millions of signals in a way that is manageable, scalable, and statistically valid? How should they analyze and predict using these data? This presentation will discuss these challenges with applied examples from monitoring and managing one of the world’s largest computers.
If you are used to traditional databases, then to only append and never update your data may sound like a crazy idea. However, not only does this enable historical queries, but also enhances fault tolerance and scalability. In this presentation we briefly describe two immutable data stores (Rich Hickey's Datomic and Greg Young's EventStore) and compare their different data models using an example problem domain. Along the way we learn about CQRS, Aggregates, Projections and why you want your data to be immutable.
EventStore is a data store for applications using event sourcing and time-series data. EventStore runs on .NET and Mono.
Datomic is a database of time-based facts, with declarative queries and ACID transactions. Datomic is written in Clojure and runs on the JVM.
Lynn wong: make a difference with big data - HPVu Hung Nguyen
op Technology Trends for 2014
• Emergence of the Mobile Cloud
Mobile distributed computing paradigm will lead to explosion of new
services.
• From Internet of Things to Web of
Things
Need connectivity, internetworking to link physical and digital.
• From Big Data to Extreme Data
Simpler analytics tools needed to leverage the data deluge.
• The Revolution Will Be 3D
New tools, techniques bring 3D printing power to masses.
• Supporting New Learning Styles
Online courses demand seamless, ubiquitous approach.
2
• Next-generation mobile
networks
Mobile infrastructure must catch up with user needs.
• Balancing Identity and Privacy
Growing risks and concerns about social networks.
• Smart and Connected Healthcare
Intelligent systems, assistive devices will improve health.
• E-Government
Interoperability a big challenge to delivering information.
• Scientific Cloud Computing
Key to solving grand challenges, pursuing breakthroughs.
Take the Big Data Challenge - Take Advantage of ALL of Your Data 16 Sept 2014pietvz
A customer service call can transform internal processes. Information in Tweets and reviews can lead to better products. Structured and unstructured data brought together can reveal patterns and relationships that unlock powerful business opportunities. We will discuss real-world use cases and best practices for building the infrastructure you need to power Big Data analytics solutions. From the latest in Hadoop innovation, cognitive computing, and cloud-based analytical web services, you will learn how organizations large and small can harness the power of unstructured human information to create, deploy, and deliver the next generation of analytics applications.
Powered by HP IDOL, HP Autonomy delivers intelligent applications that allow your organization to understand the concepts and context of all information in real time, mitigating risk and identifying opportunity. Join us at this session to learn how HP Autonomy can unlock the value of your company’s structured and unstructured data for better insight and greater competitive advantage. HP IDOL, the OS for human information, enables you to index, manage, and process all your data, both structured and unstructured. Learn how HP IDOL delivers unprecedented insights into optimized architecture, scalability, performance, mapped security, and connectivity. Find out more about IDOLOnDemand.com and how you can leverage this revolutionary technology in your own organization.
HP Helion - Copaco Cloud Event 2015 (break-out 4)Copaco Nederland
HP Helion CloudSystem is the most complete, integrated, and open cloud solution on the market. Powered by OpenStack® technology and developed with an emphasis on automation and ease-of-use, HP Helion CloudSystem redefines how you build and manage cloud services.
HUG Hadoop User Group du 29 Janvier 2015 chez HP.
Slidedeck des 3 talks ci-dessous:
#1: Traitement des données non structurées (Vidéos, images, …) avec Haven pour Hadoop,
#2: Apache Flink: Fast and Reliable Large-scale Data Processing,
#3: Etude de cas, projet Hadoop dans le domaine des RH avec Capgemini.
La vectorisation des documents : rendre comparables des informations non structurées, de nouvelles opportunités pour un acteur de l’emploi
8.0Transforming records management for Information Governance
•Access and understand virtually any source of information on-premise and in the cloud
•A strategic pillar of HP’s HAVEnBig Data platform
•Non-disruptive, manage-in-place approach complements any organization
Presentación durante el XXIX Encuentro de Telecomunicaciones y Economía Digital a cargo de Marisa Felipe, Directora General de Operaciones HP Servicios Sur de Europa
Big Data; Big Potential: How to find the talent who can harness its powerLucas Group
Big Data is in its infancy but it holds great promise. The key to success is finding and keeping the talent with the skills necessary to obtain and analyze the data, ask the right questions, and present findings in a compelling fashion that makes sense for your organization.
Making Big Data a First Class citizen in the enterpriseTony Baer
Big Data emerged with Internet companies as special projects managed by elite practitioners to solve unique problems. This approach will not be sustainable for enterprises. This presentation describes how big Data projects must become part of the fabric of your enterprise if they are to succeed.
Modernizing Architecture for a Complete Data StrategyCloudera, Inc.
Data is the future of business. Either take advantage of it, or get surpassed by those who do.
In this webinar, Ovum's Tony Baer discusses the importance of building a modern data strategy that ensures your journey with Apache Hadoop and big data is a successful one. Together, we'll walk through how to build a plan for long-term success while realizing short-term gains, including:
How to pinpoint the business goals that matter most
How to assess your strengths and weaknesses to meet those goals
How to build a thoughtful approach that ensures your initiatives succeed
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...StampedeCon
Despite widespread adoption and success most machine learning models remain black boxes. Many times users and practitioners are asked to implicitly trust the results. However understanding the reasons behind predictions is critical in assessing trust, which is fundamental if one is asked to take action based on such models, or even to compare two similar models. In this talk I will (1.) formulate the notion of interpretability of models, (2.) provide a review of various attempts and research initiatives to solve this very important problem and (3.) demonstrate real industry use-cases and results focusing primarily on Deep Neural Networks.
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017StampedeCon
Words are no longer sufficient in delivering the search results users are looking for, particularly in relation to image search. Text and languages pose many challenges in describing visual details and providing the necessary context for optimal results. Machine Learning technology opens a new world of search innovation that has yet to be applied by businesses.
In this session, Mike Ranzinger of Shutterstock will share a technical presentation detailing his research on composition aware search. He will also demonstrate how the research led to the launch of AI technology allowing users to more precisely find the image they need within Shutterstock’s collection of more than 150 million images. While the company released a number of AI search enabled tools in 2016, this new technology allows users to search for items in an image and specify where they should be located within the image. The research identifies the networks that localize and describe regions of an image as well as the relationships between things. The goal of this research was to improve the future of search using visual data, contextual search functions, and AI. A combination of multiple machine learning technologies led to this breakthrough.
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017StampedeCon
In many modern applications data are collected in unusual form. Connectome or brain imaging data are graphs. Wearable devices measuring activity are functions over time. In many cases these objects are collected for each individual or transaction leaving the statistician with the challenge of analyzing populations of data not in classical numeric and categorical formats in big spreadsheets. In this talk I introduce object oriented data analysis with an application we recently developed for regression analysis. This talk will be aimed at the general data scientist and emphasis on the concepts and not mathematical detail. The take home message is how can we use covariates (i.e., meta-data) to predict what the structure of a brain image graph will be.
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...StampedeCon
This talk aims to dive into technical details in machine learning model development, implementation and values it bring to Monsanto breeding pipeline. We genotype over 100 million seeds a year in order to save field resources and product development cycle time. Automation and high throughput production from the lab becomes key to R&D success. In house predictive model development incorporated random forest ensemble based approach with additional features derived from gaussian mixture model. The results show over 95% accuracy with less than 1% false positives/negatives. Model is highly generalizable with over 10 million data points being trained and tested on. The model also offers probabilistic approach to present genotypes in a more meaningful way and help enhanced downstream genomics analyses. The talk targets audience who are in breeding, genetics, molecular biology, and data scientists who are interested in practical applications.
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017StampedeCon
While artificial intelligence for self-driving cars and virtual assistants gets a lot of the notion of communicating the needs, effectiveness and measurements is complicated when speaking “geek”! The work of an analyst, however, does not just involve conducting data analysis within but communicating, championing and speaking simply when talking to the organization, clients and management.
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017StampedeCon
This technical session provides a hands-on introduction to TensorFlow using Keras in the Python programming language. TensorFlow is Google’s scalable, distributed, GPU-powered compute graph engine that machine learning practitioners used for deep learning. Keras provides a Python-based API that makes it easy to create well-known types of neural networks in TensorFlow. Deep learning is a group of exciting new technologies for neural networks. Through a combination of advanced training techniques and neural network architectural components, it is now possible to train neural networks of much greater complexity. Deep learning allows a model to learn hierarchies of information in a way that is similar to the function of the human brain.
Foundations of Machine Learning - StampedeCon AI Summit 2017StampedeCon
This presentation will cover all aspects of modeling, from preparing data, training and evaluating the results. There will be descriptions of the mainline ML methods including, neural nets, SVM, boosting, bagging, trees, forests, and deep learning. common problems of overfitting and dimensionality will be covered with discussion of modeling best practices. Other topics will include field standardization, encoding categorical variables, feature creation and selection. It will be a soup-to-nuts overview of all the necessary procedures for building state-of-the art predictive models.
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...StampedeCon
In this session, we’ll discuss approaches for applying convolutional neural networks to novel computer vision problems, even without having millions of images of your own. Pretrained models and generic image data sets from Google, Kaggle, universities, and other places can be leveraged and adapted to solve industry and business specific problems. We’ll discuss the approaches of transfer learning and fine tuning to help anyone get started on using deep learning to get cutting edge results on their computer vision problems.
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...StampedeCon
Like the story of the six blind men trying to explain the nature of an elephant, current research in cognitive computational systems attempts to identify the nature of an illness, human behavior, or socio-economical phenomenon, from their own perspective.
At present, there is no agreed upon definition for cognitive systems. One large communication corporation defines cognitive systems as a category of technology that uses artificial intelligence, machine learning and reasoning, to enable people and machines to interact more naturally. It also extends and magnifies human expertise and cognition to enable accurate decisions on time. Two of the most famous risk and financial advisory firms agree with that interpretation. A different large corporation, however, considers “cognitive systems” as merely marketing jargon.
If cognitive systems are going to help us solve challenging problems in medicine, economics, or other fields, three aspects must be considered in order to reveal the “true nature of the elephant”.
§ All facets of the problem must be addressed, like the main parts of the elephant had to be touched by the men.
§ These facets must be properly assembled, like the men needed to join hands around the elephant in order to understand what it was.
§ This assembly must be completed within sufficient time to anticipate future decisions. Just like the men needed to know what an elephant is before the next one charges them.
This talk will explain how agnostic (unsupervised, blinded) machine learning findings can be assembled by multiobjective and multimodal optimization research techniques would be utilized to uncover a multifaceted view of the “elephant”, in this case the human being (e.g., genomic variants, personality traits, brain images). It will also give real-world examples of how this knowledge will “extend the human capabilities” by achieving an integrative assessment of the whole person in relation to their risk, which will allow professionals to generate accurate person-centered policies: from personalized diagnoses, business opportunities, or the prevention of outbreaks.
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017StampedeCon
This talk will walk through the important building blocks of Automated AI. Rajiv will highlight the current gaps in the analytics organizations, how to close those gaps using automated AI. Some of the issues discussed around automated AI are the accuracy of models, tradeoffs around control when using automation, interpretability of models, and integration with other tools. These issues will be highlighted with examples of automated analytics in different industries. The talk will end with some examples of how automated AI in the hands of data scientists and business analysts is transforming analytic teams and organizations.
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017StampedeCon
Artificial Intelligence has entered a renaissance thanks to rapid progress in domains as diverse as self-driving cars, intelligent assistants, and game play. Underlying this progress is Deep Learning – driven by significant improvements in Graphic Processing Units and computational models inspired by the human brain that excel at capturing structures hidden in massive complex datasets. These techniques have been pioneered at research universities and digital giants but mainstream enterprises are starting to apply them as open source tools and improved hardware become available. Learn how AI is impacting analytics today and in the future.
Learn how AI is affecting the enterprise including applications like fraud detection, mobile personalization, predicting failures for IoT and text analysis to improve call center interactions. We look at how practical examples of assessing the opportunity for AI, phased adoption, and lessons going from research, to prototype, to scaled production deployment.
A Different Data Science Approach - StampedeCon AI Summit 2017StampedeCon
This session will focus on how to execute Data Science caliber efforts by creating teams with the attributes of Data Science to deliver meaningful results. As Data Scientists are harder to find and keep, this session should appeal to anyone who is either seeking an alternative approach to executing Data Science delivery or augmenting their current Data Science model with additional options.
Graph in Customer 360 - StampedeCon Big Data Conference 2017StampedeCon
Enterprises typically have many data silos of partial customer data and a common theme in big data projects to use big data tools and pipelines to unify all siloed customer data into a single, queryable, platform for improving all future customer interactions. This data often comes from billing, website traffic, logistics, and marketing; all in different formats with different properties. Graph provides a way to unify all of the data into a single place for use in tracking the flow of a user through the various silos. Graph can also be used for visualizations and analytics that are difficult in other systems.
In this talk we will explore the ways in which Graph can be leveraged in a customer 360 use case. What it can add to a more conventional system and what the approach to developing a graph based Customer 360 system should be.
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017StampedeCon
This talk will go over how to build an end-to-end data processing system in Python, from data ingest, to data analytics, to machine learning, to user presentation. Developments in old and new tools have made this particularly possible today. The talk in particular will talk about Airflow for process workflows, PySpark for data processing, Python data science libraries for machine learning and advanced analytics, and building agile microservices in Python.
System architects, software engineers, data scientists, and business leaders can all benefit from attending the talk. They should learn how to build more agile data processing systems and take away some ideas on how their data systems could be simpler and more powerful.
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017StampedeCon
Big Data doesn’t have to just mean Hadoop any more. Big Data can be done in the cloud, using tools developed by the Cloud providers. This session will cover using Amazon AWS services to implement a Big Data application. We will compare and contrast different services from Amazon with the Hadoop equivalents.
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...StampedeCon
Using big data isn’t about doing the same things we’ve always done just with different technologies. The technology advances that we’ve chosen to label as big data create the opportunity for wholly new kinds of solutions. Two of the key advances that are enabling new business capabilities are cloud-based data management platforms and streaming data processing and analytics.
In this session, Paul Boal will drill into the cloud-based streaming data architecture that has made possible EVŌ, a new breakthrough health and wellness platform. EVŌ uses a game-changing approach that leverages over 60 billion data points and a predictive analytics engine to intervene BEFORE someone becomes critically ill. All of this is possible by leveraging data from smartphones and wearable fitness devices along with advanced analytics which then help users develop and sustain positive behaviors. Attendees will learn how to create a cloud- based architecture that can receive data, apply multiple layers of dynamic business rules, and drive alerts and decisions through real-time stream processing using technologies including web services, Amazon DynamoDB and Kinesis, Drools, and Apache Spark.
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
Enterprise Holding’s first started with Hadoop as a POC in 2013. Today, we have clusters on premises and in the cloud. This talk will explore our experience with Big Data and outline three common big data architectures (batch, lambda, and kappa). Then, we’ll dive into the decision points to necessary for your own cluster, for example: cloud vs on premises, physical vs virtual, workload, and security. These decisions will help you understand what direction to take. Finally, we’ll share some lessons learned with the pieces of our architecture worked well and rant about those which didn’t. No deep Hadoop knowledge is necessary, architect or executive level.
Creating a Data Driven Organization - StampedeCon 2016StampedeCon
Companies today are all focused on finding new consumption models to better utilize the data they produce. This presentation will provide insights and best practices for creating the organization and sponsorship necessary to set the foundation for success.
For this session, Dan will provide an overview of the process and methodologies he employs to establish and sustain a Data Driven Culture. Key topics will include:
Data Driven Culture
Executive Sponsorship
Organizational Structure – Collaboration Hubs and Bi-Modal Analytics
Role of Hadoop and Big Data as Part of Data Driven Culture
Using The Internet of Things for Population Health Management - StampedeCon 2016StampedeCon
The Internet of (Human) Things is just beginning to take shape. The human body is an inexhaustible source of data about personal health, and the healthcare industry is just beginning to scratch the surface of the potential insights and value that will come from that data. While much of healthcare traditionally focuses on the episodic delivery of services, the Affordable Care Act is pushing healthcare providers, payers, and self-funded employer groups to look at ways to proactively encourage healthy behaviors. Providing personal health devices as a way to promote individual health is one way that healthcare is beginning to take advantage of IoT technologies. This session provides insight into how IoT is being leveraged in population health management through a solution jointly delivered by Amitech Solutions and Big Cloud Analytics. Attendees will learn how Hadoop is being used to gather personal device from various vendors, integrate and analyze that information, differentiate trends across regional and cultural diversity, and provide personal recommendations and insights into health risks. This session presents one important way the healthcare industry is leveraging IoT.
Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon
At Monsanto, emerging technologies such as IoT, advanced imaging and geo-spatial platforms; molecular breeding, ancestry and genomics data sets have made us rethink how we approach developing, deploying, scaling and distributing our software to accelerate predictive and prescriptive decisions. We created a Cloud based Data Science platform for the enterprise to address this need. Our primary goals were to perform analytics@scale and integrate analytics with our core product platforms.
As part of this talk, we will be sharing our journey of transformation showing how we enabled: a collaborative discovery analytics environment for data science teams to perform model development, provisioning data through APIs, streams and deploying models to production through our auto-scaling big-data compute in the cloud to perform streaming, cognitive, predictive, prescriptive, historical and batch analytics@scale, integrating analytics with our core product platforms to turn data into actionable insights.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
2. Welcome!
Christopher Surdak, JD
Technology Evangelist, Award-Winning Author, Engineer, Data Guy, Rocket Scientist,
and Global Expert in Information Governance, Analytics, Privacy and Social Media
Held roles with leading companies such as Accenture, Siemens, Dell and
Citibank. Began my career with Lockheed Martin Astrospace, where I was a
spacecraft systems engineer and rocket scientist.
Holds a Juris Doctor from Taft University, an Executive Masters In Technology
Management and a Moore Fellowship from the University of Pennsylvania, a
Master’s Certificate in Information Security from Villanova University and a BS
in Mechanical Engineering from Pennsylvania State University.
Wrote “Data Crush: How the Information Tidal Wave is Driving New Business
Opportunities,” recipient of GetAbstract’s International Book of the Year Award,
2014.
Benjamin Franklin Innovator of the Year, 2015 by the Wharton Club of DC.
Honored Consultant, FutureTrek Community, Beijing, China
12. What’s the Issue, What’s at Stake?
• E-Coupons
Creepiness
The best companies must walk the razor’s edge between these two extremes
Intimacy
• I-Coupons
• “Liking” • Geo-Tracking
• Predictive Shipping
• Suggestion lists • Cookies
• Reverse Grouponing • Behavior Modeling
• Behavior Manipulation• Needs Anticipation
14. Why Care?
What if analytics and predictive
technologies can change customer
behavior 1%?
$18 billion in increased revenue!
Cha-Ching!!!
In reality, these technologies easily
double or triple results
16. What is for sale?
Top 6 Companies in the World, by Market Capitalization, June 2014 (Source, PwC)
What do they Sell?Who?Rank
Phones?1
Oil2
Nothing?3
Money4
Oil5
Stuff6
18. Companies like Google, Facebook, Yahoo, Twitter and Microsoft
(Bing) spend tens of billions of dollars per year on servers,
storage, networking and electricity
How much did you pay to use their services?
YOU ARE THE PRODUCT!
29. Email: Christopher.w.surdak@hp.com
Twitter: @csurdak
M: 714.398.4874
If you’d like to learn more, check out “Data Crush,”
getAbstract’s International Book of the Year, 2014
Also see my columns in European Business Review, HP Matter and TechBeacon Magazines, and my
blogs on HP.com, EBReview, ChinaBusinessReview, Dataconomy.com, Inc. Magazine, and About.com
Look for my second book, “Jerk,” coming in 2016, foreword by Don Tapscott, business guru and best-
selling author of “Wikinomics,” “Growing Up Digital” and 12 other best-selling business books
And thereafter, book three, “Rupture,” coming in late-2016
Thank You
30. Thank You!
An “expert” is anyone who is
one chapter ahead of you in
whatever book you happen
to be reading.