Alternative Data News. 26, August 2020

The AltDataNewsletter by CloudQuant

Finding sources and uses for alternative data can be difficult. At CloudQuant we regularly read and search the internet for new sources of data that can be used in our mission to find alpha signals and build quantitative trading strategies. We recognize that we are technology and data junkies so we wrote our own crawler that specifically seeks out web pages, posts, and news articles that give us a snapshot of what is going on in the world of Alt Data. The following is a collection of articles that we think you will find interesting from the past week.


The 2020 Stock Market’s Collapse and Recovery in 60 seconds

The S&P 500 is an index of the 500 largest publicly traded companies in the U.S. This chart shows daily price movements for each of these companies since the beginning of the year, organized by sector. The size of each company is defined by its market capitalization based on its valuation as of August 18th.

Data is from IEX Cloud and Yahoo Finance, and the visualization was done in Javascript with d3. An interactive version of this chart is maintained at Chartfleau.

2020-08-20 Read the full story…

CloudQuant Thoughts : Another great post from data is beautiful at Reddit.

Northern Trust Launches New Environmental Data Reporting

Northern Trust announced it has further developed its environmental, social and governance (ESG) risk exposure analytics capabilities to include new reporting for key environmental data categories.

The enhancement allows Northern Trust’s clients – typically asset owners such as pension funds – to interrogate specific environmental risk indicators for their investments. It also delivers a new ‘ESG Insights: Environment’ report, providing investors with access to environmental analytics using a range of factors – including carbon footprint analysis.

Institutions can use the resulting information to engage with asset managers and stakeholders around the environmental impact of their investment portfolio, as well as to generate data and analytics for publishing in their annual disclosures. The detailed information provided supports clients in determining if they are meeting sustainable investment goals and satisfying ever-increasing regulatory requirements.

2020-08-21 09:59:00+00:00 Read the full story…
Weighted Interest Score: 3.3790, Raw Interest Score: 1.9178,
Positive Sentiment: 0.3044, Negative Sentiment 0.0609

CloudQuant Thoughts : As well as these fabulous blog posts and a state of the art backtesting and research environment, CloudQuant also provides access to Alternative Data Sets including an ESG set. Head over to our Data Catalog for more information.

An AI Just Confirmed the Existence of 50 Planets By Digging Through NASA Data

The search for other planets just got a huge upgrade. A machine learning algorithm just confirmed the existence of 50 new planets.

The team behind the algorithm, from Warwick University, fed it huge datasets originating from NASA’s now-retired Kepler mission and the Transiting Exoplanet Survey Satellite (TESS), a space telescope that launched in 2018.

The scientists are hoping their research could pave the way for future planet validation techniques. Current techniques for spotting and confirming the existence of other planets are easily swayed by noise, interference of an object in the background, or even errors in the camera.

The team trained their algorithm by teaching it the difference between confirmed planets and false positives. They then unleashed it on a separate dataset that has yet to be validated for planetary candidates.

2020-08-25 09:59:00+00:00 Read the full story…

CloudQuant Thoughts : Now that’s what I call Alternative Data.

Homebuilder Stocks Extend Gains Amid Surging Home Sales

Several leading homebuilder stocks rocketed to fresh all-time highs Friday after data revealed healthy buying interest and tightening supply in the housing market despite the ongoing pandemic. According to the National Association of Realtors, sales of existing homes soared 24.7% in July month over month, with the median price of a home sold last month increasing 8.5% from a year ago to $304,100, per CNBC. Meanwhile, supply of existing homes contracted 21.1% annually as many sellers remained on the sidelines amid the economic uncertainty.
2020-08-24 13:28:13.662000+00:00 Read the full story…
Weighted Interest Score: 4.3571, Raw Interest Score: 1.9703,
Positive Sentiment: 0.2542, Negative Sentiment 0.1695

CloudQuant Thoughts : Housing data is one I do not see many people utilize, yet it is obviously a major early indicator of the health of an economy.

A Python Tool for Data Cleaning – PyJanitor

s a data scientist, you are more or less going to spend 60-70% of your time cleaning and preparing your data. The process of cleaning, encoding and transforming your raw data in order to bring them into a format that the machine learning model can understand is called Data Pre-processing. This process is often long and cumbersome and most developers consider it to be the least favourite part of a project. Despite being tedious, it is one of the most important techniques that need to be implemented. To simplify the overall process and make it a bit more interesting, python introduces a package called PyJanitor- A Python Tool for Data Cleaning.

This article deals with an overview of what pyjanitor is, how it works and a demonstration of using this package to clean dirty data.
2020-08-26 11:30:46+00:00 Read the full story…
Weighted Interest Score: 2.8315, Raw Interest Score: 1.4236,
Positive Sentiment: 0.1349, Negative Sentiment 0.0599

CloudQuant Thoughts : A neat pandas extension that makes data cleanup a little simpler.

Top 10 Trending Python Projects On GitHub: 2020

s per the latest Data Science skills study, the data scientists and practitioners who were surveyed revealed that the top Language preferred for Statistical Modelling is Python, favoured by 65.2% proportion of the respondents.

Python is the language of choice for statistical modelling among the Data Science community, and AI and analytics practitioners seeking to upskill, such as Python for Statistical Modelling; TensorFlow for Python Frameworks; Git for Sharing code, among others.

Below here, we listed down the top 10 trending open-source projects In Python on GitHub.

  1. Manim
  2. DeepFaceLab
  3. Airflow
  4. GPT-2
  5. Horovod
  6. ML-Agents
  7. XSStrike
  8. NeuralTalk
  9. Xonsh
  10. Optuna

2020-08-25 07:30:11+00:00 Read the full story…
Weighted Interest Score: 2.7718, Raw Interest Score: 1.8018,
Positive Sentiment: 0.1982, Negative Sentiment 0.0721

CloudQuant Thoughts : This type of article is always interesting!

5 Common Skills Data Scientists Should Know

A close look at the popular skills that I have used as a Data Scientist.

  • Introduction
  • SQL
  • Python or R
  • Jupyter Notebook
  • Visualizations
  • Communication
  • Summary
  • References

Data Science and Machine Learning can oftentimes require an overwhelming amount of skills. However, over working several years at several companies as a Data Scientist, I wanted to highlight five common skills Data Scientists should know. As a Data Scientist, you can expect to use some of these skills most likely in your career. I will be outlining SQL, Python/R, Jupyter Notebook, visualizations, and communication.

You will, of course, encounter even more required skills and beneficial skills as you work along, but I hope these serve as a good start or enhancement of where you are in your current journey as a Data Scientist.
2020-08-26 03:25:37.463000+00:00 Read the full story…
Weighted Interest Score: 4.8824, Raw Interest Score: 2.2386,
Positive Sentiment: 0.3535, Negative Sentiment 0.0295

AI and big data salaries revealed: Here are the six-figure wages enterprise giants like IBM, Salesforce, and Microsoft pay the tech talent working on these cutting-edge technologies

Big data and AI have become critical tools that help businesses — including major corporations — operate more efficiently and grow faster.

This has led to a spike in demand for data scientists, analysts and engineers, and experts in building AI and machine learning systems.

Here’s how much IBM, Oracle, Cisco, Microsoft, ServiceNow, and Salesforce pay data scientists, analysts, and engineers based on disclosure data for permanent and temporary w…
2020-08-20 00:00:00 Read the full story…
Weighted Interest Score: 4.6108, Raw Interest Score: 2.1647,
Positive Sentiment: 0.1273, Negative Sentiment 0.0849

GlobalTrading Podcast Episode 6: Data Science on the Buy Side

Gary Collier, CTO of Man Group Alpha Technology, and Hinesh Kalian, Director of Data Science, Man Group, discuss the state of data science on the buy side, spanning its evolution, current challenges, and the future outlook. The podcast is moderated by Global Trading Editor Terry Flanagan.
2020-08-18 14:34:23+00:00 Read the full story…
Weighted Interest Score: 6.1489, Raw Interest Score: 1.9417,
Positive Sentiment: 0.0000, Negative Sentiment 0.3236

ModelOps: MLOps’ next frontier

In the world of artificial intelligence (AI) and machine learning (ML), as the technology advances, so too does the lexicon of terminology required to be conversant. Almost every day, there’s a new buzzword capturing the attention of the market, leaving the rest of us with yet another topic on our research agendas.

Recently, the attention has centered on “ModelOps,” or AI model operationalization. Gartner describes ModelOps as focused on the governance and life cycle management of AI and decision models, while enabling the retuning, retraining, or rebuilding of AI models — providing an uninterrupted flow between the development, operationalization, and maintenance of models within AI-based systems.

ModelOps also provides business leaders insight into model performance and outcomes in a transparent and understandable way that doesn’t require translation or explanation by data scientists or machine learning engineers.

2020-08-25 00:00:00 Read the full story…
Weighted Interest Score: 4.3061, Raw Interest Score: 1.6095,
Positive Sentiment: 0.3408, Negative Sentiment 0.0379

Competitive Advantages Drive Sweet Growth Opportunities For The Hershey Company

Photo by: John Nacion/STAR MAX/IPx 2020 6/29/20 Atmosphere amidst in New York City amidst … [+] anti-police protests and the Coronavirus Pandemic. Businesses continue to reopen during phase 2 of the city’s plan to get the economy back up and running. Telecom giant Verizon is pulling its advertising from Facebook, in what may be the biggest brand yet to join the #StopHateForProfit boycott. Other brands such as North Face, Coca Cola, Honda, Hersh…
2020-08-25 00:00:00 Read the full story…
Weighted Interest Score: 3.3242, Raw Interest Score: 1.6117,
Positive Sentiment: 0.3857, Negative Sentiment 0.1015

Python has overtaken Java as one of the hottest programming languages in the world, according to GitHub. Here’s how a boom in AI jobs is helping developers use the easy-to-learn language to land six-figure salaries,

From the popular AI project TensorFlow to Facebook’s Instagram, here’s why Python has become so popular among developers.

The programming language Python has made learning to code much easier, including for would-be developers without computer science degrees.

Since it launched in 1991, it has gained popularity among engineers and non-programmers alike, including data scientists, students, and business professionals. Dr. Chuck Severance, a clinical professor at the University of Michigan School of Information who teaches a 10-week Python course on Coursera, calls Python the “Netflix of programming.”

It’s approachable, widely useful, and extremely popular right now. In just the second week of August, nearly 8,000 people completed his course, and many former students have walked away with new jobs, he says. Python’s popularity has grown largely because of the explosion in data science jobs, experts say, which the language is particularly well-suited for.

Python even surpassed Oracle’s Java for the first time in usage and popularity in 2019, according to GitHub, to become the second most-used language after the web programming language JavaScript. A June survey from developer-focused analyst firm RedMonk found the same results. Usage of Python on GitHub projects grew 151% last year.

It has grown quickly because of its ease of use, utility, and open source nature, experts say, as well as because of the boom in artificial intelligence and data science jobs.

2020-08-20 00:00:00 Read the full story…
Weighted Interest Score: 3.2638, Raw Interest Score: 1.9049,
Positive Sentiment: 0.3124, Negative Sentiment 0.1310

5 Automation tools for supercharging your next Data Science project

Using AI to do AI – Automation has transformed many industries around the world. From self-service checkouts in supermarkets to car-building robots, technological solutions are constantly encroaching on the areas of work once the exclusive domain of humans.

As Data Scientists, we are not immune from this. Every day new products are being developed to automate parts of the Data Science life-cycle.

  • Data wrangling
  • Feature engineering
  • AutoML
  • Hyperparameter Optimisation
  • Neural Architecture Search

2020-08-26 02:45:52.252000+00:00 Read the full story…
Weighted Interest Score: 3.2215, Raw Interest Score: 1.5873,
Positive Sentiment: 0.2577, Negative Sentiment 0.1649

Unify Data Governance with Data Architecture

Think of an organization trying to create a single understanding of the information of the organization and the instances of that data around its estate. Consider different groups of people contributing to and using this model from different perspectives and varying reasons. And view this in the context of Data Governance, Data Architecture or Business Intelligence. A seemingly simple task becomes as complicated as six blind men building a model of an elephant. Each blind man has a different perspective of the elephant. This is similar to stakeholders and staff, scattered across the organization, having different conceptions and implementations of Data Governance.

As a result, many organizations end up with silos of knowledge that are fundamentally different, owned by different groups and used for different purposes. This presents risk and cost to an organization where Data Governance is important. The data architect is located at the center of this and often has the most mature and detailed view of information and data. However, data architects struggle to unite the silos and the teams involved.
2020-08-25 07:35:17+00:00 Read the full story…
Weighted Interest Score: 3.1000, Raw Interest Score: 1.7851,
Positive Sentiment: 0.1380, Negative Sentiment 0.1012

Top 10 Data Scientists In India

The data science community is growing at a fast pace, and data science units are becoming a crucial part of the organisations across industries. From interpreting large data sets to putting it in use for bringing out business decisions, data scientists are responsible for making data-driven decisions in an organisation. Recognising the data science professionals who have showcased an exceptional journey in the domain, Analytics India Magazine brings out the list of top data scientists each year.

This is the sixth year of the industry-acclaimed list where we have listed top 10 data scientists with diverse backgrounds who have made significant contributions, brought about unique innovations and showcased unparalleled accomplishments in their data science journey.

For the list, we have considered data scientists who are working with organisations or independently, irrespective of the size and nature of work. Also, we do not repeat names from previous years, so, do check earlier years’ inclusions.
2020-08-25 05:30:35+00:00 Read the full story…
Weighted Interest Score: 3.0742, Raw Interest Score: 1.7877,
Positive Sentiment: 0.3184, Negative Sentiment 0.0857

Fundamentals of Machine Learning Enabled Analytics

The famous theoretical physicist Stephen Hawking said, “It’s tempting to dismiss the notion of highly intelligent machines as mere science fiction.”

Artificial intelligence (AI), the game-changer technology of the global business world, comprises three distinct sub-disciplines: machine learning (ML), natural language processing (NLP), and cognitive computing. Automated solutions in business analytics use all these sub-technologies, but in varying degrees. Most advanced analytics platforms have incorporated ML or deep learning (DL) techniques to remain competitive in the market.

According to Gartner, 40 percent of all new enterprise applications will include AI technologies by 2021. On the other hand, organizations are flooded with data; the current challenge is extracting competitive intelligence from that “deluge of data.” Businesses that plan on surviving the digital tsunami (big data and IoT), have all put a definite business strategy in place, which connects data, analytics, and AI across the operative landscape.

2020-08-18 07:35:00+00:00 Read the full story…
Weighted Interest Score: 2.9981, Raw Interest Score: 1.8291,
Positive Sentiment: 0.3577, Negative Sentiment 0.2350

The Best Document Similarity Algorithm in 2020: A Beginner’s Guide

Picking the winner from 5 popular algorithms based on an experiment

If you want to know the best algorithm on document similarity task in 2020, you’ve come to the right place.

With 33,914 New York Times articles, I’ve tested 5 popular algorithms for the quality of document similarity. They range from a traditional statistical approach to a modern deep learning approach.

Each implementation is less than 50 lines of code. And all models being used are taken from the Internet. So you will be able to use it out of the box with no prior data science knowledge, while expecting a similar result.

In this post, you’ll learn how to implement each algorithm and how the best one is chosen.
2020-08-25 23:33:45.122000+00:00 Read the full story…
Weighted Interest Score: 2.8127, Raw Interest Score: 0.9754,
Positive Sentiment: 0.3585, Negative Sentiment 0.1054

Proposed Market Data Infrastructure Regulation And Anticipated Impact

In February 2020, the SEC proposed the Market Data Infrastructure rule1 which aimed to enhance the availability and usefulness of National Market System (NMS) information, for a wide variety of participants; as well as help reduce information asymmetries between market participants who rely upon current Security Information Processor (SIP) data, and those who use the proprietary data feeds from the national securities exchanges.

2020-08-25 09:55:25+00:00 Read the full story…
Weighted Interest Score: 2.7723, Raw Interest Score: 1.6541,
Positive Sentiment: 0.2128, Negative Sentiment 0.0755


This news clip post is produced algorithmically based upon CloudQuant’s list of sites and focus items we find interesting. We used natural language processing (NLP) to determine an interest score, and to calculate the sentiment of the linked article using the Loughran and McDonald Sentiment Word Lists.

If you would like to add your blog or website to our search crawler, please email customer_success@cloudquant.com. We welcome all contributors.

This news clip and any CloudQuant comment is for information and illustrative purposes only. It is not, and should not be regarded as investment advice or as a recommendation regarding a course of action. This information is provided with the understanding that CloudQuant is not acting in a fiduciary or advisory capacity under any contract with you, or any applicable law or regulation. You are responsible to make your own independent decision with respect to any course of action based on the content of this post.