Alternative Data News. 29, July 2020

The AltDataNewsletter by CloudQuant

Finding sources and uses for alternative data can be difficult. At CloudQuant we regularly read and search the internet for new sources of data that can be used in our mission to find alpha signals and build quantitative trading strategies. We recognize that we are technology and data junkies so we wrote our own crawler that specifically seeks out web pages, posts, and news articles that give us a snapshot of what is going on in the world of Alt Data. The following is a collection of articles that we think you will find interesting from the past week.


The Top Trending Google Searches in Every US State Throughout the 2010s

The map shows the highest trending Google searches for every state. Trending searches from 2010 to 2020 were taken from Google’s annual Year in Search summary.

Google Trends provides weekly relative search interest for every search term, along with the interest by state. Using these two datasets for each term, we’re able to calculate the relative search interest for every state for a particular week. Linear interpolation was used to calculate the daily search interest.

As the 2020 Year In Search summary is not yet available, topics were sourced from Google’s Trending Searches page. These topics were supplemented with archived copies of the same page through the Wayback Machine.

Google Trends provides weekly relative search interest for every search term, along with the interest by state. Using these two datasets for each term, we’re able to calculate the relative search interest for every state for a particular week. Linear interpolation was used to calculate the daily search interest.

Tools: Excel, Python and Blender 2.8
Sources: Trending topics from 2010 to 2019 were taken from Google’s annual Year in Search summary.

Read the full story…

CloudQuant Thoughts : It is fascinating to watch trends spread across the US and sometimes fail to break in particular states where they have their own thing going on. An this is just beautiful data science, it is truly wonderful when someone with two quite different interests (Data Science and Blender) manages to bring them together to makes something that few others could create.

Percent of Adults Who Sleep <7 Hours in US per County

Source: https://www.countyhealthrankings.org/app/wyoming/2020/measure/factors/143/datasource

Tools: python for data processing; vega + js for the plot

Created for https://city-data.com

2020-07-27 Read the full story…

CloudQuant Thoughts : A good nights sleep is thought to be one of the most important things for human health and happiness. Who knows how useful alternative data like this could be?

Hedge funds are overhauling the way they use alt data to find winning stocks as the crisis forces even quants to think big-picture like macro investors

Alternative data is an increasingly important part of hedge funds’ investment processes, but the pandemic has changed the way firms use the info.

Traditionally, quants and long-term stock-pickers use data to compare companies against each other, to find a winner in a certain field.

The virus, however, has the entire world waiting on a restart — and it has turned just about every money manager into a macro investor, constantly thinking about the..

2020-07-25 00:00:00 Read the full story…
Weighted Interest Score: 4.2853, Raw Interest Score: 1.6977,
Positive Sentiment: 0.1078, Negative Sentiment 0.0539

CloudQuant Thoughts : Going from trying to track exactly what people are buying in stores to nothing more than which stores are they going into is no rocket science. A large number of physical retail stores are going to go bankrupt before this Corona Virus has passed.

Essentia Analytics Data Shows Where Alpha Is Lost And Found

Research by Essentia Analytics, which provides behavioral data analytics and consulting for professional investors, identified how managers could have saved an average of 94 basis points of performance per year by selling stocks earlier.

Clare Flynn Levy, founder and chief executive of Essentia Analytics, told Markets Media: “Firms need to have the right data to determine the factors that create alpha as each portfolio, and each manager, is different. The dataset should include all their deals over at least five years as well as the relevant market data.”

Over three months Essentia’s research team analyzed 60 portfolios over 14 years and tracked 24 ‘categorizers’, ranging from equity sector to holding period to decision day of the week, across six broad investment decision categories, or skills: stock picking, size adjusting, entry timing, exit timing, scaling in and scaling out.

2020-07-24 Read the full story (at marketsmedia)…
2020-07-28 13:28:04+00:00 Read the full story (at tradersmagazine)…
Weighted Interest Score: 5.7064, Raw Interest Score: 1.8621,
Positive Sentiment: 0.1510, Negative Sentiment 0.2516

CloudQuant Thoughts : Repeatedly re-running your models with alternative exit strategies and grouping the results by as many difference factors as you can think of is essential to keep ahead of the game. Fortunately, in this day and age, it is quite simple to do!

Nasdaq Has Record US Equities And Options Volumes

Nasdaq reported that combined U.S. equities and options markets set a quarterly record for trading volume, boosting revenue in Market Services by 22% from a year ago. The exchange group said electronically operated equities, options and fixed income markets operated at high performance levels during the surge of trading volume related to the COVID-19 pandemic.

Adena Friedman, president and chief executive of Nasdaq, said in a statement: “Our foundational markets are demonstrating their resilience and the power of a distributed, electronic market model, handling record volumes through multiple periods of extreme volatility.”
2020-07-22 20:36:09+00:00 Read the full story…
Weighted Interest Score: 4.9952, Raw Interest Score: 1.9871,
Positive Sentiment: 0.1197, Negative Sentiment 0.0000

CloudQuant Thoughts : Whilst not Alternative Data, this does demonstrate the hyperactivity of the market and where there is hyperactivity there will be (alternative) data that helps you to see though the noise.

Building a Python Covid-19 Dashboard using Streamlit

In data visualization, dashboards are the Graphical User Interfaces which display data in an informative and highly interactive way. It contains various plots such as bars, pies, line charts etc. that are actually the visualizations of a dataset by which we can derive some useful information. Dashboards are useful because they are easy to understand and provide us with a clear picture of the key performance indicators.

Streamlit is an open-source python library that allows us to build beautiful, highly interactive, and informative dashboards easily. It also allows us to create custom based Machine Learning and Data Science applications. Every time we save the code streamlit runs from top to bottom and displays the changes in seconds because it is incredibly quick. The UI of streamlit is visually appealing and already loaded so that we don’t have to write the code about the UI of the app.

2020-07-29 11:30:00+00:00 Read the full story…
Weighted Interest Score: 2.4520, Raw Interest Score: 0.9605,
Positive Sentiment: 0.1298, Negative Sentiment 0.0649

CloudQuant Thoughts : A neat but simple looking python library, could be fun!


ESG Section

CloudQuant Thoughts : Simple stating that ESG is the best form of investment at the moment may be a little over simplistic. Examine the stocks that are listed in ESG ETFs and you may be surprised. Also, looking at our first ESG story this week – Tesla, how much of the growth of ESG investments is down to the likes of Tesla and Amazon (it may be shocking to some that Amazon is even considered an ESG stock). Don’t forget that CloudQuant also has curated datasets available including ESG datasets. Head over to our Data Catalog for more information.

Is Tesla’s green investment bubble about to burst?

Elon Musk’s EV giant occupies a unique position, ‘miles ahead of the competition’. But with rivals lining up, how long can it stay there?

Tesla’s nosebleed-inducing rise in share price shows no sign of slowing down. From lows of $185 last May, the company’s shares reached new highs of $1,643 this week ahead of its crucial second-quarter earnings on Wednesday. Long doubted and dismissed, it has now posted three consecutive quarters of profit, including one that took it through a global pandemic. It is worth $250bn, the most valuable car company in the world, and it attracts devoted fans like no-one else.

Its success is often pinned on charismatic chief executive Elon Musk, and the company’s early adoption of the electric vehicle technology that is becoming increasingly mainstream amid widespread incentives pushing carmakers away from gasoline. Tesla also occupies a unique position in the market. It’s a green company, but also a manufacturing company and a technology company. This means it attracts investment from all three sectors, and can capitalise on a resurgent interest in “green” stocks.

Interest in funds and shares driven by ESG (environmental, social, governance) priorities is growing. Data from financial services firm Morningstar showed that the first quarter of this year was a record one for sustainable funds.

2020-07-22 00:00:00 Read the full story…
Weighted Interest Score: 2.7022, Raw Interest Score: 1.3436,
Positive Sentiment: 0.2495, Negative Sentiment 0.2687

DOL Plan to Limit ESG in 401(k)s Draws Growing Opposition

Opposition to the Labor Department’s proposal to limit environmental, social and governance focused investments in 401(k) plans is growing, along with requests for a longer comment period.

Morningstar, Heartland Capital Strategies, Principles for Responsible Investment and Institutional Shareholder Services (ISS) have written comment letters opposing the proposal along with 41 Democratic members of the House, 13 Democratic members of the Senate and others.

In addition, a coalition of trade groups representing financial institutions with business in the defined contribution space are asking Labor for a 30-day extension to the public comment period on the proposal that is scheduled to end on July 30. They include the American Bankers Association, the Securities Industry and Financial Markets Association, the Insured Retirement Institute, the Investment Company Institute, the Defined Contribution Institutional Investment Association, the Investment Adviser Association and the SPARK Institute.

2020-07-27 00:00:00 Read the full story…
Weighted Interest Score: 2.6001, Raw Interest Score: 1.2922,
Positive Sentiment: 0.0630, Negative Sentiment 0.4570

4 Reasons To Take Another Look At Sustainable Investing In 2020

Looking for investment opportunities in 2020’s ever-changing markets? Why this could be a good time. If you thought sustainable investing was just a do-gooder approach, it’s time to take another look.

With all that’s changed in 2020 so far, you may not have realized that sustainable investing is emerging as a way forward. Sustainable funds are seeing a surge in assets, and some of the world’s largest asset managers see growing opportunities in sustainable investing.

If sustainable investing hasn’t been on your radar, here are four reasons why it’s worth paying attention to it now.

1. Major investment firms see sustainable investing as the future.
2. Fund companies are launching sustainable funds at a record pace.
3. Sustainable investing is being used to help manage risk in uncertain times.
4. Performance has become a top reason to invest sustainably.

2020-07-28 00:00:00 Read the full story…
Weighted Interest Score: 3.9029, Raw Interest Score: 2.0971,
Positive Sentiment: 0.2913, Negative Sentiment 0.1553

Net Flows Into Passive ESG Funds Have Outpaced Active

The US SIF Foundation today released The Rise of ESG in Passive Investments, a report that explores the growth of passive ESG (environmental, social and governance) investing and the debate on the effectiveness of passive versus active ESG funds. The paper draws on publicly available data and insights from the US SIF Foundation research advisory committee and from additional asset manager members of US SIF.

While the vast majority of sustainably invested assets are in actively managed ESG funds, net flows into passively managed ESG funds have in recent yea…
2020-07-29 09:17:57+00:00 Read the full story…
Weighted Interest Score: 3.7729, Raw Interest Score: 2.1157,
Positive Sentiment: 0.1058, Negative Sentiment 0.0353

HSBC Forms Dedicated ESG Solutions Team

HSBC announced the formation of a dedicated Environmental, Social and Governance (ESG) Solutions unit to help clients around the world rebuild and transition their businesses and economies in a more sustainable way post-COVID-19.

HSBC has taken a leading global role in ESG financing in recent years and the new unit will more effectively focus the bank’s full range of capabilities and expertise in providing clients with ESG-related advice, strategies and financing ideas.

The ESG unit will form part of a new Strategic Solutions Group, within the bank’s Capital Financing & Investment Banking Coverage division. The group will also comprise two other components – one focusing on Corporate Finance Solutions and one on Financial Institutions & Capital Solutions. They will link closely with HSBC’s sector and product bankers to provide strategic advice and financing ideas tailored to specific industries and market sectors.
2020-07-28 13:06:46+00:00 Read the full story…
Weighted Interest Score: 3.2774, Raw Interest Score: 1.8839,
Positive Sentiment: 0.2581, Negative Sentiment 0.0258


How Does Data Management Drive Efficiency for Organizations?

Data-driven analytics continue to deliver sophisticated solutions for manufacturing efficiency, early disease detection, and smart capabilities building in workplaces. Thus, industry operators and leaders continue raise their expectations and demands from data technologies with every passing year. Looking Behind the Curtain: What Really Drives Value from Data reveals some insights that global managers can learn from.

Brent Gleeson of Forbes, who regularly contributes about organizational excellence, warns that in spite of having the best infrastructure, technological support, and military intelligence, the United States could not stop many attacks against them. This important observation signals the need for speedy, data-enabled, decision-making at a time of crisis.

In The Benefits of Leading Data-Driven Organizational Change, Gleeson points out that a lack of adequate technology preparedness, lack of technology application training, and lack of informed decision-making mindset probably all contributed to the national disaster in September 2001. Transformational decision-making, according to Gleeson, requires a “shift in mindset and culture.”
2020-07-21 07:35:02+00:00 Read the full story…
Weighted Interest Score: 4.3911, Raw Interest Score: 2.5047,
Positive Sentiment: 0.3354, Negative Sentiment 0.2620

Improving massively imbalanced datasets in machine learning with synthetic data

We will use synthetic data and a few concepts from SMOTE to improve model accuracy for fraud, cyber security, or any classification with an extremely limited minority class

Handling imbalanced datasets in machine learning is a difficult challenge, and can include topics such as payment fraud, diagnosing cancer or disease, and even cyber security attacks. What all of these have in common are that only a very small percentage of the overall transactions are actually fraud, and those are the ones that we really care about detecting. In this post, we will boost accuracy on a popular Kaggle fraud dataset by training a generative synthetic data model to create additional fraudulent records. Uniquely, this model will incorporate features from both fraudulent records and their nearest neighbors, which are labeled as non-fraudulent but are close enough to the fraudulent records to be a little “shady”.

Our imbalanced dataset : For this post, we selected the popular “Credit Card Fraud Detection” dataset on Kaggle. This dataset contains labeled transactions from European credit card holders in September 2013. To protect user identities, the dataset uses dimensionality reduction of sensitive features into 27 floating point columns (V1–27) and a Time column (the number of seconds elapsed between this transaction and the first in the dataset). For this post, we will work with the first 10k records in the Credit Card fraud dataset- click below to generate the graphs below in Google Colaboratory.
2020-07-27 15:18:13.069000+00:00 Read the full story…
Weighted Interest Score: 4.1391, Raw Interest Score: 1.2868,
Positive Sentiment: 0.3309, Negative Sentiment 0.6127

Explorium raises $31 million to automate data prep with AI

Explorium, a Tel Aviv-based startup developing an automated data and feature discovery platform, today closed a $31 million funding round. The capital infusion comes after several banner months for Explorium, which has tripled its customer base since last September and incorporated data relevant to more industries and verticals.

Feature engineering — the process of using domain knowledge to extract features from raw data via data-mining techniques — is arduous. According to a Forbes survey, data scientists spend 80% of their time on data preparation, and 76% view it as the least enjoyable part of their work. It’s also expensive — Trifecta pegs the collective data prep cost for organizations at $450 billion.

Explorium aims to solve this by acting as a repository for a company’s information, connecting siloed internal data to thousands of external sources on the fly. Using machine learning, it claims to automatically extract, engineer, aggregate, and integrate the most relevant features from data to power sophisticated predictive algorithms, evaluating hundreds before scoring, ranking, and deploying the top performers.
2020-07-28 00:00:00 Read the full story…
Weighted Interest Score: 3.8652, Raw Interest Score: 2.0949,
Positive Sentiment: 0.0582, Negative Sentiment 0.1455

NVIDIA, BMW, Red Hat, and more on the promise of AI, edge computing, and computer vision

On the third day of Transform 2020, the IoT, AI at the Edge, and Computer Vision Summit presented by NVIDIA underscored the tremendous promise of these technologies. IoT is being leveraged in more transformative ways than ever, the limits of compute power on devices keep getting pushed, and computer vision models are becoming faster and more accurate.

But innovation also brings new challenges. Leaders from NVIDIA, BMW, Pinterest, Intel, Uber, and Red Hat among others gathered to talk about the most important new use cases and the most urgent issues: from ensuring greater user privacy to enabling lower latency, accelerating better search and personalization, advancing automation, delivering real-time intelligence, and more.

Implementing new AI technologies also brings new responsibilities like security, governance, accuracy, and explainability, as well as a major focus on eliminating biases around race and gender.

Here’s a look at some of the top panels of the summit.
2020-07-28 00:00:00 Read the full story…
Weighted Interest Score: 3.8492, Raw Interest Score: 1.6980,
Positive Sentiment: 0.2743, Negative Sentiment 0.1306

Top 10 Free Data Science Podcasts One Can Binge During This Lockdown

In today’s fast-paced world, podcasts have proved to be an incredibly great source of learning for data scientists who are willing to learn more from all the possible resources available. Alongside, amid COVID, when the majority of data professionals are working from home, podcasts are turning out to be an excellent way to not only upskill themselves but also to pass leisure time.

Not only AI and data science podcasts would help these professionals to be updated with latest trends and researches but also help them in understanding the core working of various data science applications. Furthermore, many of these data science podcasts also invite some of the renowned minds of the industry for data science professionals to gain more understanding of this industry.

COVID lockdown can be monotonous and daunting for many, and thus to make good use of the leisure time, data scientists can get their hands on some of the informative and exciting AI and data science podcasts. In this article, we are going to share some of the data science podcasts that one can binge during this lockdown.

020-07-29 07:30:00+00:00 Read the full story…
Weighted Interest Score: 3.5767, Raw Interest Score: 1.9941,
Positive Sentiment: 0.3234, Negative Sentiment 0.0539

Why Metadata is Even More Important Than Data

Most companies, whether in finance, retail, or government sectors, are seeking to make the most of their data to gain a competitive advantage. But not many organizations are making effective use of metadata. Arguably, data on its own can be meaningless, but when combined with metadata, it turns into information that can be exploited and, when aggregated with other datasets, delivers the insight that every organization needs to improve decision-making.

A recent Veritas Report on unlocking the value of data found that, on average, employees lose two hours a day searching for data, resulting in a 16 percent drop in workforce efficiency. For an organization of 1,000 workers that are dependent on data, the inability to find the right data at the right time costs that organization £16m a year. If run correctly, a metadata project will deliver significant time and cost savings in the short-term and enhance the effectiveness of data projects in the medium-to-longer term.

Therefore, all companies should ask themselves: Why not explore a metadata project to see how it can deliver savings and unlock future value from data?
2020-07-29 07:25:22+00:00 Read the full story…
Weighted Interest Score: 3.1880, Raw Interest Score: 1.6287,
Positive Sentiment: 0.4886, Negative Sentiment 0.1629

Transparency: a Step Towards Fairness

Pulse, a recent machine learning image reconstruction algorithm, sparked a lot of controversy. The purpose of the model was to reconstruct blurry, low resolution images. Unfortunately, when a low resolution image of President Obama was provided as input to the model, the result was the following:

Some machine learning experts attributed the model’s racial bias to the unevenness in the training data. They argued that FlickFaceHQ, the dataset on which the model was pretrained, contained mostly images of white faces. Because of that, the model learned to reconstruct white faces most of the time. Their argument has some merit. Indeed, according to the paper that introduced PULSE, even the data used to evaluate the model, CelebAHQ, “has been noted to have a severe imbalance of white faces compared to faces of people of color (almost 90% white).” It is reasonable to expect that if a deep neural network is trained and evaluated on uneven datasets, not only will it learn to classify more often the most frequent class but also its evaluation will not accurately display this discrepancy.
2020-07-27 00:00:00 Read the full story…
Weighted Interest Score: 3.0438, Raw Interest Score: 1.5255,
Positive Sentiment: 0.1291, Negative Sentiment 0.4694

BestX Launches Post-Trade TCA For Equities

BestX, State Street’s foreign exchange and fixed income best execution analytics platform, announced today that it has expanded its award-winning execution analytics software and launched a post-trade transaction cost analysis (“TCA”) module for equity markets. Covering global stock markets, the new functionality provides clients with benefits of the unique BestX web interface alongside flexible data analysis, report configuration and generation.

“We responded to our clients’ needs by expanding BestX to provide a full multi-asset class offering,” said Pete Eggleston, BestX Co-Founder. “Whilst launching fixed income at the end of 2018, it quickly became clear that clients wanted one application to analyse all of their trading. The desire to consolidate data and vendors is a trend that we anticipate will accelerate over the next few years and we needed to ensure we were positioned appropriately for this.”
2020-07-29 10:59:28+00:00 Read the full story (at marketsmedia)…
2020-07-29 12:30:00 Read the full story (at finextra)…
Weighted Interest Score: 2.8451, Raw Interest Score: 1.8460,
Positive Sentiment: 0.1582, Negative Sentiment 0.0000

Learn Data Science in a Flash!?

I was a trained classical pianist in my previous professional life. Remember those infomercials claiming that you could learn to play the piano in a flash? This has been an ongoing joke between my husband and me for years. From time to time, he threatens to achieve in mere four hours what took me years of blood, sweat, and tears!

So far, however, it has remained an empty threat (thankfully). I think most reasonable people understand these programs do not turn a complete newbie into a professional pianist in only a handful of hours.

I have always and strongly encouraged people to learn and enjoy playing the piano. It fosters appreciation for the work of other musicians more deeply and perhaps even collaborate under the right circumstances. However, offering the skill as a professional service for a fee is an entirely different matter. It would be irresponsible for me to encourage that to someone who has only some cursory training.

Can You Learn Data Science in a Flash? : The same is true of data science, or anything else for that matter. The COVID pandemic has accelerated the process of making digitization into a minimum necessity to stay competitive or even survive. We should encourage others to learn about data so they can appreciate it and be more intelligent about it. It is essential today.

2020-07-23 00:00:00 Read the full story…
Weighted Interest Score: 2.3199, Raw Interest Score: 1.2649,
Positive Sentiment: 0.2333, Negative Sentiment 0.3070

Metadata Repository Basics: From Database to Data Architecture

Companies use a metadata repository to store and share information about data or metadata. Metadata repositories, once thought limited to databases or diagrams, have evolved sophisticated Data Architectures, driving businesses to transform the marketplace digitally.

Take the New South Wales (NSW) government’s Spatial Digital Twin, which went live in February 2020. NSW, an Australian state containing Sydney, envisioned a more efficient and better state infrastructure, including “major hospital upgrades.”

In response, Data61 created a 3D-model of Sydney, providing capabilities to see future changes and past construction. Magda, the system and power behind this digital twin, relies on a metadata repository to make tons of data faster to search and understand and to pull in even more data sets. From that metadata repository, hooked up to a data repository and data depository, Australians can digitally plan and build structures in real-time.
2020-07-29 07:35:13+00:00 Read the full story…
Weighted Interest Score: 2.3191, Raw Interest Score: 1.2884,
Positive Sentiment: 0.1610, Negative Sentiment 0.0859

Dice Tech Job Report: Q2 Offers Optimism for Tech Industry

How is the COVID-19 pandemic impacting the job market? The latest edition of the Dice Tech Job Report analyzed data from the second quarter, giving us a fuller picture of how employers are dealing with the current landscape.

Although nationwide tech postings in the second quarter of this year were down when compared to the same quarter in 2019, there was much positivity within the data nonetheless. For example, many tech hubs showed continued growth, along with technologist occupations that build, maintain, and expand tech infrastructure.

During the May-June period, as companies regained even more of their equilibrium, hiring picked up for “core” technologist occupations, including data engineers (a 51 percent increase Month-over-Month)…
2020-07-28 00:00:00 Read the full story…
Weighted Interest Score: 2.2832, Raw Interest Score: 1.7635,
Positive Sentiment: 0.1735, Negative Sentiment 0.1446


This news clip post is produced algorithmically based upon CloudQuant’s list of sites and focus items we find interesting. We used natural language processing (NLP) to determine an interest score, and to calculate the sentiment of the linked article using the Loughran and McDonald Sentiment Word Lists.

If you would like to add your blog or website to our search crawler, please email customer_success@cloudquant.com. We welcome all contributors.

This news clip and any CloudQuant comment is for information and illustrative purposes only. It is not, and should not be regarded as investment advice or as a recommendation regarding a course of action. This information is provided with the understanding that CloudQuant is not acting in a fiduciary or advisory capacity under any contract with you, or any applicable law or regulation. You are responsible to make your own independent decision with respect to any course of action based on the content of this post.