Alexey Zinoviev presented this paper on the JPoint'15 conference javapoint.ru/talks/#zinoviev.
This paper covers next topics: Java, JPA, Morphia, Hibernate OGM, Spring Data, Hector, Kundera, NoSQL, Mongo, Cassandra, HBase, Riak
JavaDayKiev'15 Java in production for Data Mining Research projectsAlexey Zinoviev
Alexey Zinoviev presented this paper on the JavaDayKiev'15 conference http://javaday.org.ua/kyiv/#schedule
This paper covers next topics: Java, Spark, Hadoop, Mahout, MLlib, Weka, Machine Learning, Data Mining
Big Data in 200 km/h | AWS Big Data Demystified #1.3 Omid Vahdaty
What we're about
A while ago I entered the challenging world of Big Data. As an engineer, at first, I was not so impressed with this field. As time went by, I realised more and more, The technological challenges in this area are too great to master by one person. Just look at the picture in this articles, it only covers a small fraction of the technologies in the Big Data industry…
Consequently, I created a meetup detailing all the challenges of Big Data, especially in the world of cloud. I am using AWS infrastructure to answer the basic questions of anyone starting their way in the big data world.
how to transform data (TXT, CSV, TSV, JSON) into Parquet, ORCwhich technology should we use to model the data ? EMR? Athena? Redshift? Spectrum? Glue? Spark? SparkSQL?how to handle streaming?how to manage costs?Performance tips?Security tip?Cloud best practices tips?
Some of our online materials:
Website:
https://big-data-demystified.ninja/
Youtube channels:
https://www.youtube.com/channel/UCzeGqhZIWU-hIDczWa8GtgQ?view_as=subscriber
https://www.youtube.com/channel/UCMSdNB0fGmX5dXI7S7Y_LFA?view_as=subscriber
Meetup:
https://www.meetup.com/AWS-Big-Data-Demystified/
https://www.meetup.com/Big-Data-Demystified
Facebook Group :
https://www.facebook.com/groups/amazon.aws.big.data.demystified/
Facebook page (https://www.facebook.com/Amazon-AWS-Big-Data-Demystified-1832900280345700/)
Audience:
Data Engineers
Data Science
DevOps Engineers
Big Data Architects
Solution Architects
CTO
VP R&D
Amazon aws big data demystified | Introduction to streaming and messaging flu...Omid Vahdaty
amazon aws big data demystified meetup:
https://www.meetup.com/AWS-Big-Data-Demystified/
Introduction to streaming and messaging flume kafka sqs kinesis
JavaDayKiev'15 Java in production for Data Mining Research projectsAlexey Zinoviev
Alexey Zinoviev presented this paper on the JavaDayKiev'15 conference http://javaday.org.ua/kyiv/#schedule
This paper covers next topics: Java, Spark, Hadoop, Mahout, MLlib, Weka, Machine Learning, Data Mining
Big Data in 200 km/h | AWS Big Data Demystified #1.3 Omid Vahdaty
What we're about
A while ago I entered the challenging world of Big Data. As an engineer, at first, I was not so impressed with this field. As time went by, I realised more and more, The technological challenges in this area are too great to master by one person. Just look at the picture in this articles, it only covers a small fraction of the technologies in the Big Data industry…
Consequently, I created a meetup detailing all the challenges of Big Data, especially in the world of cloud. I am using AWS infrastructure to answer the basic questions of anyone starting their way in the big data world.
how to transform data (TXT, CSV, TSV, JSON) into Parquet, ORCwhich technology should we use to model the data ? EMR? Athena? Redshift? Spectrum? Glue? Spark? SparkSQL?how to handle streaming?how to manage costs?Performance tips?Security tip?Cloud best practices tips?
Some of our online materials:
Website:
https://big-data-demystified.ninja/
Youtube channels:
https://www.youtube.com/channel/UCzeGqhZIWU-hIDczWa8GtgQ?view_as=subscriber
https://www.youtube.com/channel/UCMSdNB0fGmX5dXI7S7Y_LFA?view_as=subscriber
Meetup:
https://www.meetup.com/AWS-Big-Data-Demystified/
https://www.meetup.com/Big-Data-Demystified
Facebook Group :
https://www.facebook.com/groups/amazon.aws.big.data.demystified/
Facebook page (https://www.facebook.com/Amazon-AWS-Big-Data-Demystified-1832900280345700/)
Audience:
Data Engineers
Data Science
DevOps Engineers
Big Data Architects
Solution Architects
CTO
VP R&D
Amazon aws big data demystified | Introduction to streaming and messaging flu...Omid Vahdaty
amazon aws big data demystified meetup:
https://www.meetup.com/AWS-Big-Data-Demystified/
Introduction to streaming and messaging flume kafka sqs kinesis
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...DataStax Academy
The state of analytics has changed dramatically over the last few years. Hadoop is now commonplace, and the ecosystem has evolved to include new tools such as Spark, Shark, and Drill, that live alongside the old MapReduce-based standards. It can be difficult to keep up with the pace of change, and newcomers are left with a dizzying variety of seemingly similar choices. This is compounded by the number of possible deployment permutations, which can cause all but the most determined to simply stick with the tried and true. But there are serious advantages to many of the new tools, and this presentation will give an analysis of the current state–including pros and cons as well as what’s needed to bootstrap and operate the various options.
About Robbie Strickland, Software Development Manager at The Weather Channel
Robbie works for The Weather Channel’s digital division as part of the team that builds backend services for weather.com and the TWC mobile apps. He has been involved in the Cassandra project since 2010 and has contributed in a variety of ways over the years; this includes work on drivers for Scala and C#, the Hadoop integration, heading up the Atlanta Cassandra Users Group, and answering lots of Stack Overflow questions.
Денис Резник "Моя база данных не справляется с нагрузкой. Что делать?"Fwdays
В течении доклада мы с вами рассмотрим ряд принципов и техник, которые позволят вашей базе данных справляться с большей нагрузкой. P.S. Все примеры и демо будут проводиться на базе данных MS SQL Server. Все совпадения с другими базами данными случайны, но вполне вероятны :) так что знания, полученные в ходе доклада, могут вам пригодиться даже если вы работаете с другой базой данных.
Solr cloud the 'search first' nosql database extended deep divelucenerevolution
Presented by Mark Miller, Software Engineer, Cloudera
As the NoSQL ecosystem looks to integrate great search, great search is naturally beginning to expose many NoSQL features. Will these Goliath's collide? Or will they remain specialized while intermingling – two sides of the same coin.
Come learn about where SolrCloud fits into the NoSQL landscape. What can it do? What will it do? And how will the big data, NoSQL, Search ecosystem evolve. If you are interested in Big Data, NoSQL, distributed systems, CAP theorem and other hype filled terms, than this talk may be for you.
My talk on NOSQL at OGF29.[Update with OSCON'10 presentation!] But updates do not work reliably in slideshare. So I also have latest version with my blog.
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...rhatr
You’ve got your Hadoop cluster, you’ve got your petabytes of unstructured data, you run mapreduce jobs and SQL-on-Hadoop queries. Something is still missing though. After all, we are not expected to enter SQL queries while looking for information on the web. Altavista and Google solved it for us ages ago. Why are we still requiring SQL or Java certification from our enterprise bigdata users? In this talk, we will look into how integration of SolrCloud into Apache Bigtop is now enabling building bigdata indexing solutions and ingest pipelines. We will dive into the details of integrating full-text search into the lifecycle of your bigdata management applications and exposing the power of Google-in-a-box to all enterprise users, not just a chosen few data scientists.
Webinar: DataStax Training - Everything you need to become a Cassandra RockstarDataStax
Looking to strengthen your expertise of Cassandra and DataStax Enterprise? This DataStax Training Webinar will arm you with the knowledge and hands-on skills to get the most out of your DataStax Enterprise environment. If you’ve already taken a DataStax training, consider this a free refresher. Considering training? Then this is a solid intro for developers and admins on your team.
This webinar will highlight the training curriculum and drill into each of the Cassandra expert-led courses so you can determine what meets your needs. Training topics:
Core Concepts, Skills, and Tools
Operations & Performance Tuning
Data Modeling
Using Apache Solr within DataStax Enterprise
And more!
a comprehensive good introduction to the the Big data world in AWS cloud, hadoop, Streaming, batch, Kinesis, DynamoDB, Hbase, EMR, Athena, Hive, Spark, Piq, Impala, Oozie, Data pipeline, Security , Cost, Best practices
Polyglot Database - Linuxcon North America 2016Dave Stokes
Many Relation Databases are adding NoSQL features to their products. So what happens when you can get direct access to the data as a key/value pair, or you can store an entire document in a column of a relational table, and more
What Your Database Query is Really DoingDave Stokes
Do you ever wonder what your database servers is REALLY doing with that query you just wrote. This is a high level overview of the process of running a query
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsDataStax
We'll be covering some aspects of our architecture, highlighting differences between MongoDB and Cassandra. We'll go in depth to explain why Cassandra is a better choice for our general purpose Application Platform (SHIFT) as well as our Media Buying Analytics tool (the SHIFT Media Manager). We'll be going over common design patterns people might be familiar with coming from a background with MongoDB and highlight how Cassandra would be used as a better alternative. We'll also touch more on cqlengine which is nearing feature completeness as the Cassandra object mapper for Python.
Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...Fwdays
Технологии с открытым исходным кодом, такие как Microsoft Orleans и ElasticSearch, - ключевые элементы архитектуры YouScan. О том, как они помогают справляться с постоянно растущими объемами данных из социальных сетей, об эволюции архитектуры YouScan, я расскажу в данном докладе.
A talk given by Ted Dunning on February 2013 on Apache Drill, an open-source community-driven project to provide easy, dependable, fast and flexible ad hoc query capabilities.
Given on a free DevelopMentor webinar. A high level overview of big data and the need for Hadoop. Also covers Pig, Hive, Yarn, and the future of Hadoop.
[db tech showcase Tokyo 2016] E32: My Life as a Disruptor by Jim StarkeyInsight Technology, Inc.
I’ve championed or developed four distinct disruptive technologies in database management. I started working on databases for the ARPAnet - the precursor of the Internet which had 47 nodes and was the largest network on earth. I advocated relational technology when it was considered an academic curiosity and introduced a new concurrency control technology that made consistency practical. More recently I created a radically new architecture for distributed ACID SQL databases. Now, my project is a critical re-evaluation of where we are, how we got here, and where we should be going. It’s going to be a wild ride.
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...DataStax Academy
The state of analytics has changed dramatically over the last few years. Hadoop is now commonplace, and the ecosystem has evolved to include new tools such as Spark, Shark, and Drill, that live alongside the old MapReduce-based standards. It can be difficult to keep up with the pace of change, and newcomers are left with a dizzying variety of seemingly similar choices. This is compounded by the number of possible deployment permutations, which can cause all but the most determined to simply stick with the tried and true. But there are serious advantages to many of the new tools, and this presentation will give an analysis of the current state–including pros and cons as well as what’s needed to bootstrap and operate the various options.
About Robbie Strickland, Software Development Manager at The Weather Channel
Robbie works for The Weather Channel’s digital division as part of the team that builds backend services for weather.com and the TWC mobile apps. He has been involved in the Cassandra project since 2010 and has contributed in a variety of ways over the years; this includes work on drivers for Scala and C#, the Hadoop integration, heading up the Atlanta Cassandra Users Group, and answering lots of Stack Overflow questions.
Денис Резник "Моя база данных не справляется с нагрузкой. Что делать?"Fwdays
В течении доклада мы с вами рассмотрим ряд принципов и техник, которые позволят вашей базе данных справляться с большей нагрузкой. P.S. Все примеры и демо будут проводиться на базе данных MS SQL Server. Все совпадения с другими базами данными случайны, но вполне вероятны :) так что знания, полученные в ходе доклада, могут вам пригодиться даже если вы работаете с другой базой данных.
Solr cloud the 'search first' nosql database extended deep divelucenerevolution
Presented by Mark Miller, Software Engineer, Cloudera
As the NoSQL ecosystem looks to integrate great search, great search is naturally beginning to expose many NoSQL features. Will these Goliath's collide? Or will they remain specialized while intermingling – two sides of the same coin.
Come learn about where SolrCloud fits into the NoSQL landscape. What can it do? What will it do? And how will the big data, NoSQL, Search ecosystem evolve. If you are interested in Big Data, NoSQL, distributed systems, CAP theorem and other hype filled terms, than this talk may be for you.
My talk on NOSQL at OGF29.[Update with OSCON'10 presentation!] But updates do not work reliably in slideshare. So I also have latest version with my blog.
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...rhatr
You’ve got your Hadoop cluster, you’ve got your petabytes of unstructured data, you run mapreduce jobs and SQL-on-Hadoop queries. Something is still missing though. After all, we are not expected to enter SQL queries while looking for information on the web. Altavista and Google solved it for us ages ago. Why are we still requiring SQL or Java certification from our enterprise bigdata users? In this talk, we will look into how integration of SolrCloud into Apache Bigtop is now enabling building bigdata indexing solutions and ingest pipelines. We will dive into the details of integrating full-text search into the lifecycle of your bigdata management applications and exposing the power of Google-in-a-box to all enterprise users, not just a chosen few data scientists.
Webinar: DataStax Training - Everything you need to become a Cassandra RockstarDataStax
Looking to strengthen your expertise of Cassandra and DataStax Enterprise? This DataStax Training Webinar will arm you with the knowledge and hands-on skills to get the most out of your DataStax Enterprise environment. If you’ve already taken a DataStax training, consider this a free refresher. Considering training? Then this is a solid intro for developers and admins on your team.
This webinar will highlight the training curriculum and drill into each of the Cassandra expert-led courses so you can determine what meets your needs. Training topics:
Core Concepts, Skills, and Tools
Operations & Performance Tuning
Data Modeling
Using Apache Solr within DataStax Enterprise
And more!
a comprehensive good introduction to the the Big data world in AWS cloud, hadoop, Streaming, batch, Kinesis, DynamoDB, Hbase, EMR, Athena, Hive, Spark, Piq, Impala, Oozie, Data pipeline, Security , Cost, Best practices
Polyglot Database - Linuxcon North America 2016Dave Stokes
Many Relation Databases are adding NoSQL features to their products. So what happens when you can get direct access to the data as a key/value pair, or you can store an entire document in a column of a relational table, and more
What Your Database Query is Really DoingDave Stokes
Do you ever wonder what your database servers is REALLY doing with that query you just wrote. This is a high level overview of the process of running a query
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsDataStax
We'll be covering some aspects of our architecture, highlighting differences between MongoDB and Cassandra. We'll go in depth to explain why Cassandra is a better choice for our general purpose Application Platform (SHIFT) as well as our Media Buying Analytics tool (the SHIFT Media Manager). We'll be going over common design patterns people might be familiar with coming from a background with MongoDB and highlight how Cassandra would be used as a better alternative. We'll also touch more on cqlengine which is nearing feature completeness as the Cassandra object mapper for Python.
Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...Fwdays
Технологии с открытым исходным кодом, такие как Microsoft Orleans и ElasticSearch, - ключевые элементы архитектуры YouScan. О том, как они помогают справляться с постоянно растущими объемами данных из социальных сетей, об эволюции архитектуры YouScan, я расскажу в данном докладе.
A talk given by Ted Dunning on February 2013 on Apache Drill, an open-source community-driven project to provide easy, dependable, fast and flexible ad hoc query capabilities.
Given on a free DevelopMentor webinar. A high level overview of big data and the need for Hadoop. Also covers Pig, Hive, Yarn, and the future of Hadoop.
[db tech showcase Tokyo 2016] E32: My Life as a Disruptor by Jim StarkeyInsight Technology, Inc.
I’ve championed or developed four distinct disruptive technologies in database management. I started working on databases for the ARPAnet - the precursor of the Internet which had 47 nodes and was the largest network on earth. I advocated relational technology when it was considered an academic curiosity and introduced a new concurrency control technology that made consistency practical. More recently I created a radically new architecture for distributed ACID SQL databases. Now, my project is a critical re-evaluation of where we are, how we got here, and where we should be going. It’s going to be a wild ride.
Мастер-класс по BigData Tools для HappyDev'15Alexey Zinoviev
Данила, BigData Tool Master,
собрал Hadoop - кластер,
Запустил Dataset
Он скрипты на Scala
Run'ил на Spark постоянно
И писал в HDFSssss
Если во время доклада "Когда все данные станут большими..." мы будем говорить о вопросах и ответах, то на этом мастер-классе мы уже потопчемся в вотчине BigData-разработчиков.
Начнем с классики на Hadoop, познаем боль MapReduce job, потыкаем Pig + Hive, затем плавно свальсируем в сторону Spark и попишем код в легком и удобном pipeline - стиле.
Для кого хорошо подходит данный мастер-класс: вы умеете читать и понимать код на Java на уровне хотя бы Junior, умеете писать SQL-запросы, в универе вы ходили хоть на одну пару по матану или терверу, вас либо недавно поставили, либо вскоре поставят на проект, где надо уметь ручками работать с вышеперечисленным зверинцем. Ну или вам просто интересно посмотреть на мощь даннодробилок, написанных на Java, и у вас в анамнезе неудачный опыт с NoSQL/SQL, как хранилищем, которое было ответственно за все, включая аналитику.
HappyDev'15 Keynote: Когда все данные станут большими...Alexey Zinoviev
Этот момент обязательно наступит, если ваш проект, ваш бизнес сделаны не для того, чтобы вспыхнуть Фениксом в пламени бюджетов. Его важно не пропустить и начать обряд масштабирования как можно раньше.
Однако, не для каждой ситуации может подойти простое натравливание Hadoop на ваши логи, перелив данных из PostgreSQL в Cassandra или беспощадный тюнинг nginx и JVM.
Всегда стоит идти от задач, от представления о системе аналитики или от определенного заранее уровня отзывчивости системы. В этом докладе я хотел бы сосредоточиться не на инструментарии, столь важном для разработчика, а, напротив, поговорить о различных типах вопросов и болей с которыми приходят к нам заказчики в реальном мире, где никому нет дела до ваших результатов на Kaggle (онлайн-олимпиада по анализу данных) и синтетических тестов производительности, а также о процессе поиска ответов на эти вопросы. В реальном мире конечная идея приложения может измениться до неузнаваемости в один момент.
Приходите, разберем как хорошие случаи, так и типичные ошибки в построении приложений.
Для кого хорошо подойдет данный доклад: для тех, кто не слишком знаком с концепцией BigData, либо хорошо знаком с инструментарием разработчика, но нет определенной ясности в том, а для чего все это нужно. Ну и если вы идете на мастер-класс, то заходите, лишним не будет.
Alexey Zinoviev presented this paper on the Joker'15 conference http://jokerconf.com/talks/zynovyev/
This paper covers next topics: Java, Morphia, Hibernate OGM, JPA, Spring Data, Kundera, NoSQL, Mongo, Jongo
Alexey Zinoviev presented this paper on the JBreak'16 conference http://jbreak.ru/talks/zinoviev.html
This paper covers next topics: Java, Hadoop, HDFS, MapReduce, Join Algorithms, HDP
[db tech showcase Tokyo 2016] E34: Oracle SE - RAC, HA and Standby are Still ...Insight Technology, Inc.
Standard Edition (SE) is alive and well – maybe it had some growing pains over the last year, BUT it is here to stay! SE is a powerful database albeit with some limitations. whether it is using a Cloud based environment or on premise. In this session we will discuss Oracle SE and review some of the recent changes and the introduction of the new kid on the block – Standard Edition 2 (SE2). Topics that will be discussed include moving between Editions, High Availability, Disaster Recovery as well as Backup and Recovery.
Wie geht ein Unternehmen im Zeitalter des Web 2.0 mit riesigen, unstrukturierten Datenmengen um? Dank einer Einladung der grössten Internetagentur der Schweiz, Namics, durften wir zu diesem brandaktuellen Thema am 09.09.2011 im Rahmen ihres alljährlichen Weiterbildungsevents referieren. Unser Architect Christian Gügi sprach über das Thema “Big Data im Unternehmenseinsatz mit Hadoop”.
Zum Inhalt:
Überall auf der Welt trafen sich zum NoSQL Summer 2010 Interessierte, um Papers zum Thema NoSQL zu lesen, zu verstehen und zu diskutieren. Dazu zählten insbesondere die Papers über Google’s Chubby, MapReduce & BigTable aus dem Jahr 2006, aber auch Cassandra (Facebook), (Dynamo) Amazon, Hadoop (Apache) uvm. In der Zwischenzeit hat sich das Themengebiet ausgedehnt, ein Markt wächst, immer mehr Produkte etablieren sich und viele Unternehmen greifen das Thema auf. NoSQL ist kein Buzz mehr. Aber was versteht man unter NoSQL, wann und wofür wird es eingesetzt und welche Produkte gibt es? Im Vortrag werden diese Fragestellungen anhand von Hadoop und Lily erläutert und damit der Bogen zu aktuellen Content Management Systemen geschlagen.
Did you miss Scala Days 2015 in San Francisco? Have no fear! BoldRadius was there and we've compiled the best of the best! Here are the highlights of a great conference.
JDD 2016 - Michal Matloka - Small Intro To Big DataPROIDEA
Pig, Hive, Flink, Kafka, Zeppelin... if you now wonder if someone just tried to offend you or are those just Pokemon names, then this talk is just for you! Big Data is everywhere and new tools for it are released almost at the speed of new JavaScript frameworks. During this entry level presentation we will walk though the challenges which Big Data presents, reflect how big is big and introduce currently most fancy and popular (mostly open source) tools. We'll try to spark off interest in Big Data by showing application areas and by throwing ideas where you can later dive into.
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...Felix Gessert
The unprecedented scale at which data is consumed and generated today has shown a large demand for scalable data management and given rise to non-relational, distributed "NoSQL" database systems. Two central problems triggered this process: 1) vast amounts of user-generated content in modern applications and the resulting requests loads and data volumes 2) the desire of the developer community to employ problem-specific data models for storage and querying. To address these needs, various data stores have been developed by both industry and research, arguing that the era of one-size-fits-all database systems is over. The heterogeneity and sheer amount of these systems - now commonly referred to as NoSQL data stores - make it increasingly difficult to select the most appropriate system for a given application. Therefore, these systems are frequently combined in polyglot persistence architectures to leverage each system in its respective sweet spot. This tutorial gives an in-depth survey of the most relevant NoSQL databases to provide comparative classification and highlight open challenges. To this end, we analyze the approach of each system to derive its scalability, availability, consistency, data modeling and querying characteristics. We present how each system's design is governed by a central set of trade-offs over irreconcilable system properties. We then cover recent research results in distributed data management to illustrate that some shortcomings of NoSQL systems could already be solved in practice, whereas other NoSQL data management problems pose interesting and unsolved research challenges.
If you'd like to use these slides for e.g. teaching, contact us at gessert at informatik.uni-hamburg.de - we'll send you the PowerPoint.
comprehensive Introduction to NoSQL solutions inside the big data landscape. Graph store? Column store? key Value store? Document Store? redis or memcache? dynamo db? mongo db ? hbase? Cloud or open source?
Presented on Codemotion Warsaw 2016 and JDD 2016.
Pig, Hive, Flink, Kafka, Zeppelin... if you now wonder if someone just tried to offend you or are those just Pokemon names, then this talk is just for you!
Big Data is everywhere and new tools for it are released almost at the speed of new JavaScript frameworks. During this entry level presentation we will walk though the challenges which Big Data presents, reflect how big is big and introduce currently most fancy and popular (mostly open source) tools.
We'll try to spark off interest in Big Data by showing application areas and by throwing ideas where you can later dive into.
The world has changed and having one huge server won’t do the job anymore, when you’re talking about vast amounts of data, growing all the time the ability to Scale Out would be your savior. Apache Spark is a fast and general engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing.
This lecture will be about the basics of Apache Spark and distributed computing and the development tools needed to have a functional environment.
This is second part of Spark 2 new features overview
This topic covers API changes; Structured Streaming; Output Modes, Apache Kafka, Kafka Direct Streams, Kafka source/sink in Spark 2.2)
This is first part of Spark 2 new features overview
This topic covers API changes; Structured Streaming; Encoders; Memory Management in Spark; Tungsten issues; Catalyst features)
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...Alexey Zinoviev
Alexey Zinoviev presented this paper on the PiterPy conference http://it-sobytie.ru/events/3275.
This paper covers next topics: Data Mining, Machine Learning, Python, SciPy, NumPy, Pandas, NetworkX, Scikit-learn, Octave, R language
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Alexey Zinoviev
Alexey Zinoviev presented this paper on the Highload++ conference http://www.highload.ru/2014/abstracts/1516.html
This paper covers next topics: Pregel, Graph Theory, Giraph, Okapi, GraphX, GraphChi, Spark, Shrotest Path Problem, Road Network, Road Graph
Joker'14 Java as a fundamental working tool of the Data ScientistAlexey Zinoviev
Alexey Zinoviev presented this paper on the Jocker conference http://jokerconf.com/#zinoviev.
This paper covers next topics: Data Mining, Machine Learning, Mahout, Spark, MLlib, Python, Octave, R language
Alexey Zinoviev presented this paper on Second Thumbtack Technology Expert Day.
This paper covers next topics: Data Mining, Machine Learning, Octave, R language
YouTube: http://youtu.be/kGIP6XeWiaA
EST: Smart rate (Effective recommendation system for Taxi drivers based on th...Alexey Zinoviev
Presentation from EST geo hackathon about effective recommendation system for Taxi drivers based on their order history.
Habrhabr paper: http://habrahabr.ru/company/est/blog/225285/
Android Geo Apps in Soviet Russia: Latitude and longitude find youAlexey Zinoviev
Alexey Zinoviev presented this paper on DroidCon Moscow 2014 http://ru.droidcon.com/2014/android-geo-apps/ and on Thumbtack Technology Expert Day.
Youtube video is here https://www.youtube.com/watch?v=AstDJbcT2lQ
This paper covers next topics: Android, Google Maps, Open Street Maps, Yandex Map Kit, HERE Maps, GPS, localization.
Keynote on JavaDay Omsk 2014 about new features in Java 8Alexey Zinoviev
Zinoviev Alexey presented this paper on JavaDay Omsk 2014. Paper covers next topics: Java 8, Stream API, Method reference, roadmap for Java 9, default methods in interfaes, SAM, functional interface.
Big data algorithms and data structures for large scale graphsAlexey Zinoviev
Alexey Zinoviev presents graph processing tools and new algorythms for shortest path problem on the DUMP-2014 (popular Ural IT conference)
Keywords: Pregel, Apache Giraph, shortest path problem
Video: http://youtu.be/MGccYYrP9f0
Выбор NoSQL базы данных для вашего проекта: "Не в свои сани не садись"Alexey Zinoviev
Alexey Zinoviev Алексей Зиновьев рассказывает о выборе одной из следующих баз данных CouchDB, Neo4j, Mongo, Cassandra, HBase, Riak на Happydev 2013
Article "Choice of NoSQL database for your project: Don't bite off more than you can chew" presented on HappyDev 2013 (IT-conference in Omsk) by Alexey Zinoviev
The main idea of this article is comparison of the most popular NoSQL databases: CouchDB, Cassandra, Mongodb, Riak, Neo4j, HBase
Алгоритмы и структуры данных BigData для графов большой размерностиAlexey Zinoviev
Article "Algorithms and Data Structures Big Data for large-scale graphs" presented on School-conference on Mathematical Problems of Informatics http://omskconf2013.oscsbras.ru/index.html by Alexey Zinoviev
MyBatis и Hibernate на одном проекте. Как подружить?Alexey Zinoviev
Zinoviev Alexey presented this paper on CodeFest 2013, Novosibirsk.
Paper covers next topics: Hibernate, MyBatis, ORM, databases, SQL, JDBC, patterns, XML
Зиновьев Алексей Zinoviev Alexey выступил на Codefest 2013 с данным докладом.
Видео приглашение: http://youtu.be/8KObW8pZ9e0
Видео доклада: http://youtu.be/Tm5rl4ObWBA
This lecture is about travelship of two russian high-school students to Google I/O 2013.
Алексей Зиновьев и Алексей Коровянский покажут фотографии и погрузят в пучины Америки
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
GraphSummit Paris - The art of the possible with Graph TechnologyNeo4j
Sudhir Hasbe, Chief Product Officer, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Graspan: A Big Data System for Big Code AnalysisAftab Hussain
We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.
We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations.
These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
- Accepted in ASPLOS ‘17, Xi’an, China.
- Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17.
- Invited for presentation at SoCal PLS ‘16.
- Invited for poster presentation at PLDI SRC ‘16.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
May Marketo Masterclass, London MUG May 22 2024.pdf
JPoint'15 Mom, I so wish Hibernate for my NoSQL database...
1. Speaker : Alexey Zinoviev
Mom, I so wish Hibernate for my NoSQL
database...
2. About
● I am a scientist. The area of my interests includes machine
learning, traffic jams prediction, BigData algorythms.
● But I'm a programmer, so I'm interested in NoSQL databases,
Java, Android, Hadoop, Spark.
15. Flower varieties I
Data
Model
Performance Scalability Flexibility Complexity Functionality
Key–
value
Stores
high high high none variable (none)
Column
Store
high high moderate low minimal
Document
Store
high variable
(high)
high low variable (low)
Graph
Database
variable variable high high graph theory
Relational
Database
variable variable low moderate relational algebra
16. Flower varieties II
Database Data model Query API Data storage system
Cassandra Column Family Thrift Memtable/SSTable
CouchDB Documents Map/Reduce Append-only-B-tree
Hbase Column Family Thrift, REST Memtable/SSTable on HDFS
MongoDB Documents Cursor B-tree
Neo4j Edges/Verticies Graph On-disk linked lists
Riak Key/Value Nested hashes,
REST
Hash
17. Flower varieties III
Database Secondary
indexes
MapReduce Free form queries
Cassandra yes no CQL (no joins)
CouchDB yes JavaScript Views
Hbase no Hadoop weak support
MongoDB yes JavaScript Full stack without joins
Neo4j yes (with Lucene) no Search, graph operations
Riak yes JavaScript, Erlang weak support, Lucene
20. ● CRUD + column iteration, Partial support of JPA standart
● Consistency Level can be set per Column Family and per
operation type (Read, Write)
● Based on Thrift RPC protocol (1 response per 1 request)
● Mapping a Collection (POJO property) to columns
● Inheritance through ‘single table’
● Custom converters to/from byte[]
Hector
21.
22.
23. ● DataStax developed a new protocol that doesn't have RPC
limitations (Asynchronous I/O)
● Low-level API with simple mapping
● Works with CQL3
● QueryBuilder reminds CriteriaAPI
● Accessor-annotated interfaces
DataStax Java Driver
27. ● Achilles : well documented and provides transactions
● Astyanax : connection pool, thread safety and pagination
● Pelops : old project, good bycicle
● PlayORM : strange but powerful thing
● Easy-Cassandra : simple annotations + CRUD
● Thrift as low level API
Other Cassandra’s OM
29. ● Integrated with Spring, Guice and other DI frameworks
● Lifecycle Method Annotations (@PrePersist, @PostLoad)
● Built on top of Mongo Java Driver
● More better than old-style BSON-object quering
● Fluent Query API :
ds.createQuery(MyEntity.class).filter("foo >",12).order("date, -
foo");
Morphia
36. ● Jongo : mongo - shell queries in Java-code
● EclipseLink : different support of different NoSQL
databases
● MJORM : Google Code, XML mapping + MQL (SQL syntax
for Mongo data extracting)
● DataNucleus : support many Js as JDO, JPA
Mongo - mongo
38. ● Java Persistence (JPA) support for NoSQL solutions
● JP-QL queries are converted in native backend queries
● Hibernate Search as indexing engine and use full-text
queries
● You can call flush(), commit() and demarcate transactions
● It supports only MongoDB, Neo4j, Infinispan, Ehcache
Hibernate OGM
43. ● Redis: Rapid access for reads and writes. No need to be durable
● RBDMS: Needs transactional updates and has tabular structure.
● Riak: Needs high availability across multiple locations. Can merge
inconsistent writes
● Neo4j: Rapidly traverse links between friends and ratings.
● MongoDB: Lots of reads, infrequent writes. Powerful aggregation
mechanism.
● Cassandra: Large-scale analytics on large cluster. High volume of writes
on multiple nodes
Polyglot Persistance
44. Kundera : Polyglot approach
● Atomicity guarantee and Transaction management
● Strictly JPA 2.1 compatible
● It supports Cassandra, Mongo, Hbase, Redis, Neo4j and etc
● @Embedded and @ElementCollection for ColumnFamily
and nested documents
● OneToMany, OneToOne, ManyToMany relationships
● Not full JPQL support for different database
46. If you developing a project for...
● a U.S. company - Morphia or Hector
● a transnational company with history of merging - Kundera
● a Deutsche Bank - Hibernate OGM or SpringData
● a Russian company, or one in in the former U.S.S.R. - native
drivers is the best approach (you will spend so many time)
● … just to play - Jongo, Easy - Cassandra and EclipseLink
47. Your Country Needs You!
● Know your cases and data!
● Choose right database!
● Choose right framework!
● In Soviet Russia backends
are waiting you!