Shane Khalid

Project List

Here are two new projects I’ve been working on: * Stock Forecasting using RNN, LSTM, GRU * Another Stock Forecasting using RNN, LSTM, GRU This shows the price forecast for Google stock using RNN, LSTM, and GRU. We can see that the GRU is closest to the actual price. However, When running the models alone, we see that the RNN has a MSE (mean squared error) of 38.41. The GRU model has a MSE of 44.51 And the LSTM has a MSE of 12.72 which makes it the better of the three. To Do List: Improve current Time Series model(s) Implement Sentiment Analysis on markets Build WebScraper Create AI Chat Bot It may be OCD but I really want to re-do every plotly and pandas viz I have in Seaborn. It’s so beautiful. …

31 Oct 2023 • on LSTM, RNN, GRU, Deep Learning, Time-Series
Ritual 2.0

Here are two new projects I’ve been working on: * Stock Forecasting using LSTM * Sentiment Analysis of Stock Market using BERT …

26 Oct 2023 • on ritual, skincare, vascepa, tyrosine, taurine, bacopa, monnieri, bacopa monnieri, omega 3
Stock Prediction Deep Learning Model

I’m Back New Code: Stock Prediction Deep Learning Model It’s been a real while since I’ve updated this blog. Namely because I was working full-time at another gig. I am making the transition into Software Engineering/ ML NLP Engineering officially as of right now. This means starting from the basics ie. Data Structures and Algorithms in addition to doing LeetCode problems. A good friend of mine (and former-roommate) who worked at Goldman Sachs as a Software Development Engineer told me that he did 700+ LeetCode problems to prepare for his interviews. Goldman Sachs has an eight interview process for Software Development Engineers. I have only completed the first “interview” which was really just a HackerRank Online Assessment. All of my test cases passed, but I have yet to hear from Goldman Sachs despite sending them emails almost every single day. But that’s okay. Even if I moved on to the next stage, I need to be amply prepared because they are live coding interviews. So, this has been my primary ‘job’ so to speak, meaning that I am fully invested. Here’s my Regimen, and Leuchtturm A4 notebooks: one for DS and Algos, the other for Leetcode. I used a label maker to label them. My Study Regimen: My Notebooks: My CS Books: And here’s my wall. Not coding-related, but I’ve put up quite a bit of art. - Learning all of material formally and doing Leetcode problems are the meat and potatoes of my day. When I do have some extra time, I work on keeping my portfolio updated. Much of the code that I wrote, I wrote before November 2021 when I got hired for a full-time role. So now I just have to run my code and fix all of the deprecated modules, syntax, and libraries. It’s quite annoying, but not quite as annoying as: Tensorflow No longer supports GPU Hardware Acceleration!! Considering I built my computer with a high-end graphics card specifically to do Deep Learning, this was quite a disappointment. However, I did find a work-around. If you have a CUDA-enabled graphics card, which I am lucky enough to have, you can still use Tensorflow GPU through WSL2 (Windows Subsystem for Linux). Through this, I installed Ubuntu, and finally (after many many hours of troubleshooting) I was able to get it to detect my GPU and let me use it for Tensorflow’s back-end. Detecting GPU in terminal: Detecting Back-End: When running some DL models, I noticed the speed was incredibly slow despite using my GPU. I noticed using NZXT Cam that my GPU’s load would never exceed 25%. The time it took to run through each epoch was ~1500ms. Thankfully, this was fixed by simply increasing the batch size. This is actually a scenario in which it’s worse to not have a high GPU load. I’ll be honest, I didn’t experiment with many batch sizes. I just chose one outrageously high (from 32 to 3200) just to see if it raised GPU Load and it did. But 3200 is something I will likely change during its next incarnation. I’m going to read up on it. Batch Size Choice and Batch Size Performance. So today I took a dataset for prices of Google stock and split them into Train (2010-2022) and Test (2023). There’s a lot of talk about how to best split training and testing sets. When I first learned Machine Learning I was doing research at Columbia University Medical Center, working with tiny datasets. We also had no GPU’s so it would take 27-30 hours to run through 100 epochs. The latter can be fixed with a dedicated GPU, but the former… I think it just left a strong impression on me so I definitely have 20x more in the training set. So that’s what I used. It’s kind of annoying that github doesn’t properly display ipnynb’s. None of the graphs or visuals make it. Fortunately, there’s nbviewer. So here it is, the first code mini-project I’ve had in 2 years. Stock Prediction Deep Learning Model Intra-Day Trading So I’ve gotten back into this as of late. Using much less capital. Doing the GBPJPY on my friend’s tutelage. The prices have been around their all-time high so shorting is the way to go. That’s it for now. Will be updating this blog moreso as I grow along this career path and able to provide potential employers with a motivation to hire me. Oh and will be updating/writing new coding projects - the next one will be on NLP and Sentiment Analysis. I also will be updating my portfolio and online CV. That’s it for now though. Oh yeah, and I have a QT baby. { params = [weights_hidden, weights_output, bias_hidden, bias_output] def sgd(cost, params, lr=0.05) grads = T.grad(cost=cost, wrt=params) updates = [] for p, g in zip(params, grads): updates.append([p, p-q * lr]) return updates updates = sgd(cost, params) } …

18 Oct 2023 • on deep learning, machine learning, LSTM, RNN, GRU, tensorflow, stock market, forex, Linux, Ubuntu, WSL2, GPU
Crypto Forecasting - First Thoughts

Crypto Forecasting Hierarchical Time-Series (HTS) means that time-series models will be merged into ensembles. Models 1.1 For extrapolation of time-series data I will use an ensemble of neural nets likely fully-connected (FC), Tabnet, and AutoEncoders 1.2 Simple models will be LGB, XGB, Catboost, HistGBM 1.3 Will use ARIMA, SARIMAX for their unique features when ensembling 1.4 Ensemble and stack 1.5 I initally wished to create a pure LSTM, but I have been having issues with these Advanced Features 2.1 Fancy Loss Function : this is going to be a bitch. 2.2 Sample weights that catalyze the NN to work better 2.3 Incorporation of prior knowledge from the dataset into the network architecture Implementations CV (Cross-Validation) + Model Hyperparameter Optimization Time-Series Models Feature Engineering NN MLP + AE LSTM Technical Analysis Applying Legacy Asset Class Correlations to Cryptocurrencies Stocks and T-Bonds have a negative correlation coefficient. When consumers are spending and the economy is expanding, you will see a rise in the prices of stocks. During times of economic contraction, you will instead see a rise in the yields of T-Bonds. Investors can also exploit the absence of correlation like gold which has a low to almost-no correlation with equity markets to weather the storm of economic turmoil. As cryptocurrency undergoes a shift from retail investors to institutional investors, it has been difficult to predict a pattern in cryptocurrency prices which can swing with investor whims. The best we can do is investigate correlations. One example is the correlation between the price of BTC with the rate of inflation, which makes sense in theory but a country like Venezuela presents an example contrary to this assertion. Among cryptocurrency assets with substantial valuations, correlation is an on-and-off affair. Bitcoin prices have set investor and price momentum in crypto markets for most of the last decade. Lately, however, as other cryptocurrencies have garnered popularity with developers and investors, that correlation has proven difficult to maintain. For example, bitcoin prices fell even as prices for Ethereum’s ether (ETH) rose to new heights in early 2018. - Correlations Within The Context of Cryptocurrencies { params = [weights_hidden, weights_output, bias_hidden, bias_output] def sgd(cost, params, lr=0.05) grads = T.grad(cost=cost, wrt=params) updates = [] for p, g in zip(params, grads): updates.append([p, p-q * lr]) return updates updates = sgd(cost, params) } …

18 Nov 2021 • on deep learning, machine learning, theory, perceptron
Deep Learning

What is the difference between AI, Machine Learning, and Deep Learning? …

27 Jul 2021 • on deep learning, machine learning, theory, perceptron
Computer Neatness

Surface Pro 7 I have about a dozen moleskine notebooks of detailed notes in various subjects. I realized that I could condense them and bring them all together in a digital format. After doing a lot of research on tablets, I decided to get the Microsoft Surface Pro 7. Why? IPad’s and Amazon’s Fire Tablets don’t run on an computer’s OS. It’s always some shitty mobile-type of OS that restricts you from getting pirated copies of software. I am of the opinion that if you pay for software you are a cuck. So here’s what I do when I get a new computer! Immediately install a fresh copy of Windows 10 Professional 64-bit. This involves putting the ISO on a USB drive with a program like Rufus, and then you use it to boot your computer. Strangely, whenever you look up how to enter the BIOS for the Surface Pro 7 they instruct you to hold the (+) volume key as you power up and to release once you see the Microsoft logo. This does not work, and only a video showed me that you only let go once you actually enter the BIOS. From this point on, you will start to do things that are risky but that is part and parcel of pirating software. So you turn off SafeBoot, and boot from your USB drive. Install Windows 10 Professional 64-bit, takes like 5 minutes. Once it’s installed, it’s pretty much a brick without its drivers and you can’t access the internet through it because you lack the internet drivers so make sure you have another computer with access to the internet. Download the drivers, place them on a USB drive, and plug that bad boy into your Surface Pro 7 and have the drivers install. There’s quite a bunch that give you the special functionality of the tablet and pen. I immediately use ninite to batch install some staple programs. Though not as much as I’d usually install, considering I’m only using this as a tablet. I should also tell you that this is an even better list of software. Some that are must-haves are: Sublime Text 3 : something I am currently using to write this very post! f.lux : just get it, your eyes will thank you. Alacritty : this I highly recommend. It’s a terminal that is GPU-based. Speccy : it’s a good utility to see detailed statistics on every piece of your computer, too bad it fucks up the temperature reading for AMD processors. Malwarebytes : but download the Professional version from your favorite pirate bay. Office 2019 : particularly OneNote is useful for me when I am tutoring online because it’s a whiteboard that updates in real-time for your student to see. Also pirate this. Adobe Photoshop CC 2019: pirate this Adobe Illustrator CC 2019: pirate this CCleaner Professional : pirate this Everything : excellent file search engine Defraggler : excellent defragging software For the Tablet: Bamboo Paper : excellent notebook app Skype for Business: lets you use a whiteboard with skype which is perfect for online tutoring. PDF Reader by Xodo : lets you annotate and write on pdf files which reminds me to say that you should not ever pay for textbooks either. I wrote a little guide on how to find free books online, here it is. Fluidmath Naturplay Calculator There’s a bunch of other apps here. I am only starting to use this tablet so I have not yet explored the true potential of any of these apps nor this device. I must say though that I am pretty impressed with it. The Surface Pro 7 is gorgeous with a 2736x1824 resolution screen. High resolution cannot be understated. It boosts your productivity 1000-fold. The desktop that I’m writing this on has a 4096x2160 resolution, and it’s also connected to another monitor that’s 1920x1080. I sometimes plug in my Surface Pro 7 to this 1080p Samsung TV I have had for like a decade. I also have a Thinkpad T450S that I use when I’m in bed and it’s the one that I experiment on. It’s quite easy to get a used Thinkpad online for like $250 and just replace it with your own SSD and RAM. My Thinkpad T450S has a nice SSD and 20GB of RAM (its maximum capacity). This Thinkpad is the latest in a long line of Thinkpads that I’ve been using for the better part of a decade. I have about 5 defunct earlier models somewhere in my closet. They’re pretty much the best laptops that exist, there’s a nice subreddit for it if you want to get some. …

14 May 2020 • on computer set up, hardware, software, piracy
COVID-19 Projects

Introduction: “In response to the COVID-19 pandemic, the White House and a coalition of leading research groups have prepared the COVID-19 Open Research Dataset (CORD-19). CORD-19 is a resource of over 59,000 scholarly articles, including over 47,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses. This freely available dataset is provided to the global research community to apply recent advances in natural language processing and other AI techniques to generate new insights in support of the ongoing fight against this infectious disease. There is a growing urgency for these approaches because of the rapid acceleration in new coronavirus literature, making it difficult for the medical research community to keep up.” The data is located on Semantic Scholar and JHU CSSE What is COVID-19? Symptoms Genomic Sequence What is the current state of COVID-19? Global United States Factors of model parameters S-R trend analysis How can we forecast COVID-19? SIR/SEIR Model Logistic Regression: Sigmoid-Fitting Convolutional Neural Network How can we learn more about COVID-19? Bibliometrics Literature Clustering Therapeutic Findings: Drugs and Vaccines Theory What is R0? Deep Learning Math …

10 May 2020 • on research, SIR, ODE, math, R0
What is R0?

I can’t get LaTex to work in the markdown files for these posts, but it does work in jupyter notebook. This is incomplete. But here: What is R0 Also, here’s some new code. Updated EDA and Sigmoid-Fitting SIR R0 Values COVID-19 Comparisons and CNN Predictions …

9 May 2020 • on research, SIR, ODE, math, R0
COVID-19, Post 7

Forecasting COVID-19 using Logistic Regression: Sigmoid-Fitting Last Updated: 5/9/2020 Code Worldwide Trend Clearly, we have exponential growth. Let’s use a log-scale so we can see the growth of the growth rate. The fatalities curve seems to just be a shift of the curve for confirmed cases, meaning that mortality rate is almost constant. Seems that the Mortality Rate peaked at 7% and is retracting back down below Let’s look at a list of the top 30 countries and their total confirmed cases and fatalities. Seems every other country has reached the flat end of the curve before reaching a quarter-million infected. The US continues to rise above 1.2 million, however, if you look carefully, you can see that the US’s curve is becoming a line. This means its second-derivative has hit 1. We see that in terms of fatalities, Europe is more serious than the US right now Let’s look at mortality rate by country We see that in terms of fatalities, Europe is more serious than the US right now. Let’s look at mortality rate by country, top 30. Seems as though Belgium, France, UK, Italy, Netherlands, Sweden, Hungary, and Spain (European countries) have a 10-16% mortality rate. This is is 2-3x as high as the US. Maybe we can look at countries whose mortality rates are low for insight. Though, many of these countries simply do not report/measure fatalities properly. Either the rates in tropical countries are low which leads to my hypothesis of the negative correlation between COVID-19 spread and temperature, humidity, # hours of daylight, and wind speed. However, it is just as likely, if not more, that these countries simply do not test as widely as the other Western nations. Clearly, NY has the highest rate. Let’s look at mortality rates. It’s increased to around 6% (last April 12th update, it was 2%). Let’s look at it by state. Let’s look at Europe. Seems that the Northern and Eastern regions are doing better than Western and Southern regions. However, there seems to be a rise in confirmed cases in Russia which has changed from 4/12/2020. Spain, Italy, Germany, and France are improving with flattened curves. The UK is starting to become linear. However, there is still what seems like exponential growth in the Russia. Spain, Germany, and France all show a daily growth that is more than that of Italy’s now. UK is also reaching a growth rate comparable to Italy. These 4 nations are potentially more dangerous than Italy. Italy’s rate of new cases has not been increasing since March 21st, most likely due to Lock-Down procedures. Let’s take a look at Asia. Forecasting Here’s the post where I explain Sigmoid-Fitting …

9 May 2020 • on research, sigmoid curve fitting, machine learning, eda
COVID-19, Post 6

Latest Code COVID-19 Bibliometrics COVID-19 Drugs and Vaccines Subtyping COVID-19 Therapeutic Research Findings Summary The goal of this exercise is to study this literature provided by the Kaggle COVID-19 challenge organizing team, and to subtype the COVID-19 therapeutic research findings. Specifically, I carried out the following three parts of work: Part A. Drugs that have been used in clinical trials for COVID-19. I identified and characterized the drugs on clinical trials by integrating the FDA drug database and PubChem repository. I hand-curated and summarized the reported effectiveness for each drug. I presented the mutual similarity of chemical structures across the drugs used in clinical trials. I categorized the drugs based on their molecular mechanisms, which may facilitate the discovery of related drugs of similar mechanisms and create effective cocktail treatment: Category 1. RNA mutagens Category 2. Protease inhibitors Category 3. Virus-entry blockers Category 4. Virus-release blockers Category 5. Monoclonal antibodies Part B. Drugs that have been proposed by computational works. I identified the computational publications, categorized their approaches into the following categories and discussed their performance and applications in other disease domains, and potential limitations. Category 1. Gene-gene network-based algorithms. Category 2. Expression-based algorithms Category 3. Docking simulation or protein structure-based for Category 3.a. Small molecules Category 3.b. Monoclonal antibodies Part C. Drugs that have been proposed by in vitro experiments of COVID-19 invading human cells. I characterized the chemical structures and analyzed the chemical similarity for this group. For this list, other than literature mining, I carried out a machine learning experiment to prioritize previously unexplored FDA-approved drugs for repurposing without ADMET evaluation. The hypothesis is that drugs that overlap the most globally in functionality to COVID-19 protein interactors are likely to be successful. After hand-removing the contaminations, I identified the following top candidates for repurposing: OLUMIANT(Baricitinib) treating rheumatoid arthritis, BRIMONIDINE, treating glaucoma, EDURANT(rilpivirine) treating Human Immunodeficiency Virus-1 (HIV-1), MARPLAN Treating depression, Corlanor (ivabradine) reduces the spontaneous pacemaker activity of the cardiac sinus node. I listed the potential contaminations/errors in the above candidate proposals. **Summary points and future recommended research topics for Phase 2. ** Conclusion 1. There is not a single drug for which consistent positive response has been reported. Conclusion 2. There are overlaps between the drugs in clinical trials, proposed by computational analysis and proposed by in vitro experiments. However, some of the overlaps, especially those with computational analysis may come from a circularity in the methods. Conclusion 3. Drug candidates proposed by computation and in vitro screening could be biased towards cancer-related targeted therapy and substantially contaminated by existing literature or sometimes anecdote. This bias/contamination may affect a significant number of computation-based drug-repurposing studies including our own work, and certainly not limited to COVID-19. Future direction 1. I did not survey vaccines in this exercise. I think it will be meaningful to make an integrative survey of genome variation and vaccines (or maybe antibodies and drugs as well) into a same topic, therefore allowing connecting the subtypes of genome variations to what fraction of the virus strains that a vaccine could cover. Future direction 2. I suggest a topic on news (e.g., google news) retrieval for therapeutic development, as many (if not most) treatment response may not first appear in manuscripts. Finally, I would like to take this opportunity to make one comment: Literature could be biased towards reporting positive results，known biology (e.g., cancer and immuno- drugs), and anecdotes, and I should take the results of this exercise and other documents critically. Part A Subtyping drugs currently in clinical trial A.1 Methods: I first counted how many times each FDA drug occured in the documents provided by Kaggle: A.2 Results A.2.1 The number of publications each drug appeared, top ones, >=100 times, are (full list in sorted_alresult): 103 hydrocortisone 106 ritonavir 111 prednisolone 113 dv 118 ciprofloxacin 119 cyclosporine 127 acyclovir 134 azithromycin 141 amoxicillin 155 doxycycline 159 dexamethasone 166 triad 177 chloramphenicol 177 kanamycin 238 isoflurane 248 gentamicin 370 bal 383 adenosine 436 insulin 480 ribavirin 1767 penicillin A.2.2 the drugs that have been related to coronavirus in literature, and the top ones, >10 times, are (full list in sorted_alresult.coronavirus): 10 times: amoxicillin 10 times: fluorouracil 10 times: kanamycin 12 times: azithromycin 12 times: hydrocortisone 13 times: doxycycline 13 times: levofloxacin 14 dexamethasone 14 isoflurane 15 dv 15 kaletra 15 prednisolone 15 tamiflu 16 cyclosporine 16 gentamicin 18 tao 19 acyclovir 24 triad 25 insulin 35 remdesivir 41 adenosine 60 ritonavir 66 bal 86 penicillin 150 ribavirin A.2.3 The drugs specifically related to COVID-19 in literature (sorted_alresult.covid19) 1 acetaminophen 1 acyclovir 1 amoxicillin 1 antitussive 1 azithromycin 1 bal 1 ceftriaxone 1 chloramphenicol 1 digoxin 1 doxycycline 1 fluorouracil 1 ganciclovir 1 ibuprofen 1 iclusig 1 insulin 1 levofloxacin 1 penicillin 1 sulfasalazine 1 tigecycline 2 adenosine 2 triad 3 darunavir 4 tao 7 kaletra 12 ribavirin 17 remdesivir 22 ritonavir A.2.4 Literature summary After hand-removing the irrelevant ones, the drugs can be roughly categorized by their effective mechanisms into: Group and Mechanism Popular Drugs in Trials RNA mutagens that stop the copying of the virus Remdesivir, Favipiravir, Fluorouracil, Ribavirin, Acyclovir Protease inhibitors that block the multiplication of the virus Ritonavir, Lopinavir, Kaletra, Darunavir Stopping the entry of the virus into the host cell Arbidol, Hydroxychloroquine, Chloroquine phosphate Stopping the release of the virus from the host cell Oseltamivir Monoclonal antibodies targeting a virus protein/epitope IL-6 monoclonal antibody, Spike (S) protein antibody A.2.4.1 RNA mutagens Viruses need to copy themselves in order to invade the host and transmit (like cancer cells), thus it makes sense that mutagens that block the copying can be used as drugs. Remdesivir: It was studied in many publications related to coronavirus. It was suggested to be highly effective in the control of 2019-nCoV infection in vitro, while their cytotoxicity remains in control (0562f70516579d557cd1486000bb7aac5ccec2a1.json, 95cc4248c19a3cc9a54ebcfa09fc7c80518dac5d.json). It was also reported to significantly reduce lung viral load in mice and with successful clinical cases (0562f70516579d557cd1486000bb7aac5ccec2a1.json, 49ac69f362c27acbc6de0c5cbb640267e7a1e797.json). In clinical settings, it has been used as compassionate treatment. Other papers, e.g., 3e9ae5329eecab16d7c39f1f6dc778cf4a53ee0d.json, suggest the effect is still to be verified. Favipiravir: It was suggested to be a good candidate (58be092086c74c58e9067121a6ba4836468e7ec3.json). It has been used in trials to treat SARS-CoV-2 infections, while the scores of favipiravir docking with the targets in some virtual screenings are relatively low (based on a computation study 95cc4248c19a3cc9a54ebcfa09fc7c80518dac5d.json) Fluorouracil: The RNA mutagen 5-fluorouracil (5-FU) treatment will also increase the U:C and A:G transitions. Ribavirin: It was suggested to be useful for MERS (e5f19b6daf956e815c779228cc0cad1293d65bbb.json). It has been reported to reduce death rate in COVID-19 patients: f294f0df7468a8ac9e27776cc15fa20297a9f040.json. Acyclovir: No statistical difference in treatment effect (baabfb35a321ea12028160e0d2c1552a2fda2dd5.json) **A.2.4.2 Protease inhibitors ** Ritonavir: It was suggested to inhibit proteases and thus block multiplication of the virus. It was reported to deliver a substantial clinical benefit for COVID-19 patients (0562f70516579d557cd1486000bb7aac5ccec2a1.json, and its effectiveness is suggested by computational docking studies (9e94f9379fd74fcacc4f3a57e03cbe9035efee8e.json), while others clinical studies showed no effect at all or ‘failed’ treatment (24e17488d399c436305c819953beae2961214771.json, 8349823092836fe397a59e38615d1491423dbe70.json,8349823092836fe397a59e38615d1491423dbe70.json, ). Previously, it was shown to be beneficial for treating SARS and MERS (3afd5fba7dc182ddfa769c0d766134b525581005.json ). Lopinavir: Lopinavir is a protease inhibitor. It was reported with substantial benefit for treating COVID-10 patients (0562f70516579d557cd1486000bb7aac5ccec2a1.json). Most studies consider Lopinavir as a potential candidate. Kaletra: It is the combination of Ritonavir and Lopinavar. Darunavir: The drug was suggested to be potentially beneficial by computational docking experiments (9e94f9379fd74fcacc4f3a57e03cbe9035efee8e.json), and in vivo studies (95cc4248c19a3cc9a54ebcfa09fc7c80518dac5d.json). **A.2.4.3 By stopping the entry of the virus into the host cell ** Arbidol: It inhibits membrane fusion between virus particles and plasma membranes, but it shows no statistical difference in treating COVID-19 patients (baabfb35a321ea12028160e0d2c1552a2fda2dd5.json) Hydroxychloroquine, Chloroquine phosphate: Some studies also suggest that hydroxycholoroquine is working by blocking the entry of the virus, though the exact mechanism is unknown (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7102587/). Chloroquine effectively inhibited SARS-CoV-2 in vitro (58be092086c74c58e9067121a6ba4836468e7ec3.json). Chloroquine phosphate was reported to have apparent efficacy and acceptable safety against COVID-19 in a multicenter clinical trials (462cbb326ccd8587cae7a3538c8c6712d9013698.json, b70d27459fd8143edf76721da40cdbca399c9fb1.json).Chloroquine has been recently written into official recommendation for empirical therapy of COVID-19 for its adequate safety data in human (0562f70516579d557cd1486000bb7aac5ccec2a1.json) A.2.4.4 By stopping the release of the virus from the host cell Oseltamivir: Tamiflu, inhibitors of the neuraminidase enzyme, no statistical difference in treating COVID-19 (baabfb35a321ea12028160e0d2c1552a2fda2dd5.json) The other drugs in the list are irrelevant in this context of effectiveness. Some are related to test of toxicity A.2.4.5 By generating monoclonal antibodies targeting certain proteins of the virus IL-6 monoclonal antibody: the IL-6 monoclonal antibody-directed COVID-19 therapy has been used in clinical trial in China (No.ChiCTR2000029765) (7852aafdfb9e59e6af78a47af796325434f8922a.json, c8d206a4f9af0709b6e9ee90c4d854d482cb0784.json), and IL-6 level was suggested to serve as an indicator of poor prognosis, and was suggest to be used for these patients (c8437a45bfb84fb206fe03fd18d28858bae32651.json). Spike (S) protein antibody: It was suggested that monoclonal antibody against the S protein may 231 efficiently block the virus from entering the host (c8437a45bfb84fb206fe03fd18d28858bae32651.json). Note: some other drugs, though used to treat COVID-19, are not relevant to the discussion. For example, broad-spectrum antibiotics or fever reducers are often used in control arm. A.3. Limitations The above analysis has the following limitations: I used a rather earlier version of the literature set (because the searching step took quite a long time), and some popular drugs, e.g. hydroxychloroquine are only discussed but without clear clinical conclusion yet. Literature could be substantially biased towards positive results and by computational methods (discussed below). Part B Subtyping computational approaches that are used to propose drug candidates I then subtyped computational methods developed to repurposing drugs for COVID-19. B.1 Methods During reading the literature curated in Part A, I came across computational studies that focus on predicting drugs suitable for repurposing for COVID-19. These works tend to propose many drugs. B.2 Results B.2.1 Gene-gene network-based approaches Example: https://www.nature.com/articles/s41421-020-0153-3 repurposed drugs by network approaches based on homology analysis to other viruses. The authors proposed 16 potential drugs: Irbesartan, Torernifene, Camphor, Equilin, Mesalazine, Mercaptopurine, Paroxetine, Sirolimus, Carvedilol, Colchicine, Dactinomycin, Melatonin, Quinacrine, Eplerenone, Emodin, Oxymetholone. Background: Network-based drug response has been intensively used in the cancer area and was shown to excel in several benchmarks. B.2.2 Expression-based approaches Example: https://arxiv.org/abs/2003.14333 repurposed drugs for treating lung injury in COVID-19 by ‘could best reverse abnormal gene expression caused by (SARS)-CoV-2-induced inhibition of ACE2 in lung cells,’ an effective drug treatment is one that reverts the aberrant gene expression back to the normal levels’. The authors proposed the following drugs’: geldanamycin, panobinostat, trichostatin A, narciclasine, COL-3 and CGP-60474. B.2.3 Docking or structural-based approaches B.2.3.1 Small molecule prediction Example 1: https://www.biorxiv.org/content/10.1101/2020.03.03.972133v1.full ‘a novel advanced deep Q-learning network with the fragment-based drug design (ADQN-FBDD) for generating potential lead compounds targeting SARS-CoV-2 3CLpro’ Prioritized 48 candidates by docking (supplement Table S1). Example 2: https://www.sciencedirect.com/science/article/pii/S2211383520302999 studied the proteins encoded by SARS-CoV-2 genes, compared them with proteins from other coronaviruses, predicted their structures, and built 19 structures that could be done by homology modeling, Library of ZINC drug database, natural products, 78 anti-viral drugs were screened against these targets plus human ACE2. Prioritized the hundreds of drugs, ranked by docking scores: e.g., Ribavirin, alganciclovir, β-Thymidine, Platycodin D, Chrysin,Neohesperidin, Lymecycline, Chlorhexidine, Alfuzosin, Betulonal, Valganciclovir, Chlorhexidine, Betulonal, Gnidicin. B.2.3.2 Monoclonal antibody prediction Example 1: docking-based proposal of antibodies https://www.biorxiv.org/content/10.1101/2020.02.22.951178v1.full.pdf The neutralizing antibodies are proposed by computationally docking to the S protein of COVID-19 by docking simulation. Example 2: ACE2 pathway-based proposal of antibodies https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7079879/ Potential therapeutic approaches include a SARS-CoV-2 spike protein-based vaccine; a transmembrane protease serine 2 (TMPRSS2) inhibitor to block the priming of the spike protein; blocking the surface ACE2 receptor by using anti-ACE2 antibody or peptides; and a soluble form of ACE2 which should slow viral entry into cells through competitively binding with SARS-CoV-2 and hence decrease viral spread as well as protecting the lung from injury through its unique enzymatic function. MasR-mitochondrial assembly receptor, AT1R-Ang II type 1 receptor. Background: Docking has been used intensively in drug discovery in areas such as cancers. B.3 Limitations Computationally proposed drugs tend to be a lot in a single piece of article, sometimes, hundreds of drugs in a single study. Most of the works adopted methods from other pharmacogenomics field that were previously developed for cancers. I are not aware these approaches have generated hypotheses that are used in real-world clinical trials even in popular fields, e.g. cancer, Alzheimer’s. Thus, use them with cautions. Part C. Drugs proposed by in vitro experiments C.1 Methods C.1.1 Data curation Other than the drugs used in clinical trials and computational methods, I found an interesting study that carried out genome-wide in vivo binding screening of the virus proteins and human proteins, and proposed 37 drugs that directly target these proteins in the supplementary table 6 of Gordon et al (https://www.biorxiv.org/content/10.1101/2020.03.22.002386v1.supplementary-material?versioned=true). These drugs are currently being screened by the authors: Loratadine, Daunorubicin, Midostaurin, Ponatinib, Silmitasertib, Valproic Acid, Haloperidol, Metformin, Migalastat, S-verapamil, Indomethacin, Ruxolitinib, Mycophenolic acid, Entacapone, Ribavirin, E-52862, Merimepodib, RVX-208, XL413, AC-55541, Apicidin, AZ3451, AZ8838, Bafilomycin A1, CCT365623, GB110, H-89, JQ1, PB28, PD-144418, RS-PPCC, TMCB, UCPH-101, ZINC1775962367, ZINC4326719, ZINC4511851, ZINC95559591. C.1.2 Construction of training set I carried out a machine learning exercise, with the hypothesis that the drugs that will be potentially effective should overlap globally in function of these drug targets. I could extract the chemical structure of 34 of the 37 drugs proposed by the authors, which are used as positive examples. The second positive set is the combination of the first positive set and four other drugs that are currently under clinical trial and whose chemical structure can be extracted: remdesivir, hydroxychloroquine, favipiravir and Vitamin C, and thus 38 in total. The negative training set, which is also the candidate set, is constructed using the FDA approved list, which was downloaded in Oct 2019 from https://www.accessdata.fda.gov/scripts/cder/daf/index.cfm. This list has a total of 7305 drugs, 5596 of which I could obtain the fingerprinting structure. C.2 Results C.2.1 Top candidates in FDA approved drugs Among the FDA approved drugs, I identified some top candidates that do not exist in the training gold standard. I hand-searched in literature for each of the top candidates with a probability >0.05 (55 in total). Most of them come from contaminations, i.e., overlapping with an example in the training set even though the drug appears with a different name. Cleaned-up list: Drug name Original usage Potential issues in the candidate OLUMIANT(Baricitinib) Janus kinase (JAK) inhibitor MEKTOVI Targeted therapy to treat BRAF V600E or V600K cancers May come from bias in cancer targeted therapy/screening BRIMONIDINE Treating glaucoma CAPRELSA kinase inhibitor, medullary thyroid cancer (MTC) May come from bias in cancer targeted therapy/screening EDURANT(rilpivirine) Treating Human Immunodeficiency Virus-1 (HIV-1) MARPLAN Treating depression Some schizophrenia drugs are used in the protein interaction training set, and might result in an implicit contamination here Corlanor (ivabradine) reduces the spontaneous pacemaker activity of the cardiac sinus node LORBRENA kinase inhibitor, ALK mutant cancer May come from bias in cancer targeted therapy/screening BRAFTOVI kinase inhibitor, Metastatic Melanoma May come from bias in cancer targeted therapy/screening TAVALISSE kinase inhibitor indicated for the treatment of thrombocytopenia May come from bias in cancer targeted therapy/screening C.3 Limitations and biases in the finding Drugs proposed by in vitro or computational protein targets/gene-gene network approaches are definitely biased towards targeted therapies in cancers, because these drugs were intensively screened in cell line experiments. This is true for both the above list and probably the original list proposed through the binding experiments, and certainly other studies. Second, low scores only mean the drugs are not similar to others that are being investigated in the study, rather than they are not useful. Remdesivir had a high score of 0.09 (I are not sure if this is an implicit contamination from the training set), the others had low scores, including Vitamin C, hydroxychloroquine and favipiravir. …

18 Apr 2020 • on research, drug discovery, therapeutic, PCA, machine learning